Abstract
Results
This paper presents the R/Bioconductor package minet (version 1.1.6) which provides a set of functions to infer mutual information networks from a dataset. Once fed with a microarray dataset, the package returns a network where nodes denote genes, edges model statistical dependencies between genes and the weight of an edge quantifies the statistical evidence of a specific (e.g transcriptional) genetogene interaction. Four different entropy estimators are made available in the package minet (empirical, MillerMadow, SchurmannGrassberger and shrink) as well as four different inference methods, namely relevance networks, ARACNE, CLR and MRNET. Also, the package integrates accuracy assessment tools, like Fscores, PRcurves and ROCcurves in order to compare the inferred network with a reference one.
Conclusion
The package minet provides a series of tools for inferring transcriptional networks from microarray data. It is freely available from the Comprehensive R Archive Network (CRAN) as well as from the Bioconductor website.
Background
Modelling transcriptional interactions by large networks of interacting elements and determining how these interactions can be effectively learned from measured expression data are two important issues in system biology [1]. It should be noted that by focusing only on transcript data, the inferred network should not be considered as a proper biochemical regulatory network, but rather as a genetogene network where many physical connections between macromolecules might be hidden by shortcuts. In spite of some evident limitations the bioinformatics community made important advances in this domain over the last few years [2,3]. In particular, mutual information networks have been succesfully applied to transcriptional network inference [46]. Such methods, which typically rely on the estimation of mutual information between all pairs of variables, have recently held the attention of the bioinformatics community for the inference of very large networks (up to several thousands nodes) [4,79].
R is a widely used open source language and environment for statistical computing and graphics [10] which has become a defacto standard in statistical modeling, data analysis, biostatistics and machine learning [11]. An important feature of the R environment is that it integrates generic data analysis and visualization functionalities with offtheshelf packages implementing the latest advances in computational statistics. Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data [12] mainly based on the R programming language. This paper introduces the new R and Bioconductor package minet, where the acronym stands for Mutual Information NETwork inference. This package is freely available on the R CRAN package resource [10] as well as on the Bioconductor website [12].
1 Mutual information networks
Mutual information networks are a subcategory of network inference methods. The rationale of this family of methods is to infer a link between a couple of nodes if it has a high score based on mutual information [9].
Mutual informaton network inference proceeds in two steps. The first step is the computation of the mutual information matrix (MIM), a square matrix whose i, jth element
is the mutual information between X_{i }and X_{j}, where X_{i }∈ , i = 1,...,n, is a discrete random variable denoting the expression level of the ith gene. The second step is the computation of an edge score for each pair of nodes by an inference algorithm that takes the MIM matrix as input.
The adoption of mutual information in network inference tasks can be traced back to the Chow and Liu's tree algorithm [13,14]. Mutual information provides a natural generalization of the correlation since it is a nonlinear measure of dependency. Hence with mutual information generalized correlation networks (relevance networks [7]) and also conditional independence graphs (e.g. ARACNE [8]) can be built. An advantage of these methods is their ability to deal with up to several thousands of variables also in the presence of a limited number of samples. This is made possible by the fact that the MIM computation requires only estimations of a bivariate mutual information term. Since each bivariate estimation can be computed fastly and is low variant also for a small number of samples, this family of methods is adapted for dealing with microarray data. Note that since mutual information is a symmetric measure, it is not possible to derive the direction of an edge using a mutual information network inference technique. Notwithstanding the orientation of the edges can be obtained by using algorithms like IC which are well known in the graphical modelling community [15].
1.1 Relevance Network
The relevance network approach [7] has been introduced in gene clustering and was successfully applied to infer relationships between RNA expressions and chemotherapeutic susceptibility [6]. The approach consists in inferring a genetic network where a pair of genes {X_{i}, X_{j}} is linked by an edge if the mutual information I(X_{i}; X_{j}) is larger than a given threshold I_{0}. The complexity of the method is O(n^{2}) since all pairwise interactions are considered.
Note that this method does not eliminate all the indirect interactions between genes. For example, if gene X_{1 }regulates both gene X_{2 }and gene X_{3}, this would cause a high mutual information between the pairs {X_{1}, X_{2}}, {X_{1}, X_{3}} and {X_{2}, X_{3}}. As a consequence, the algorithm will set an edge between X_{2 }and X_{3 }although these two genes interact only through gene X_{1}.
1.2 CLR Algorithm
The CLR algorithm [4] is an extension of the relevance network approach. This algorithm computes the mutual information for each pair of genes and derives a score related to the empirical distribution of the MI values. In particular, instead of considering the information I(X_{i}; X_{j}) between genes X_{i }and X_{j}, it takes into account the score where
and μ_{i }and σ_{i }are respectively the sample mean and standard deviation of the empirical distribution of the values I(X_{i}, X_{k}), k = 1,...,n. The CLR algorithm was successfully applied to decipher the E. Coli TRN [4]. CLR has a complexity in O(n^{2}) once the MIM is computed.
1.3 ARACNE
The Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [8] is based on the Data Processing Inequality [16]. This inequality states that, if gene X_{1 }interacts with gene X_{3 }through gene X_{2}, then
ARACNE starts by assigning to each pair of nodes a weight equal to the mutual information. Then, as in relevance networks, all edges for which I(X_{i}; X_{j}) <I_{0 }are removed, with I_{0 }a given threshold. Eventually, the weakest edge of each triplet is interpreted as an indirect interaction and is removed if the difference between the two lowest weights is above a threshold W_{0}. Note that by increasing I_{0 }the number of inferred edges is decreased while the opposite effect is obtained by increasing W_{0}.
If the network is a tree and only pairwise interactions are present, the method guarantees the reconstruction of the original network, once it is provided with the exact MIM. ARACNE's complexity is O(n^{3}) since the algorithm considers all triplets of genes. In [8] the method was able to recover components of the TRN in mammalian cells and outperformed Bayesian networks and relevance networks on several inference tasks [8].
1.4 MRNET
MRNET [9] infers a network using the maximum relevance/minimum redundancy (MRMR) feature selection method [17,18]. The idea consists in performing a series of supervised MRMR gene selection procedures where each gene in turn plays the role of the target output.
The MRMR method has been introduced in [17,18] together with a bestfirst search strategy for performing filter selection in supervised learning problems. Consider a supervised learning task where the output is denoted by Y and V is the set of input variables. The method ranks the set V of inputs according to a score that is the difference between the mutual information with the output variable Y (maximum relevance) and the average mutual information with all the previously ranked variables (minimum redundancy). The rationale is that direct interactions (i.e. the most informative variables to the target Y) should be well ranked whereas indirect interactions (i.e. the ones with redundant information with the direct ones) should be badly ranked by the method. The greedy search starts by selecting the variable X_{i }having the highest mutual information to the target Y. The second selected variable X_{j }will be the one with a high information I(X_{j}; Y) to the target and at the same time a low information I(X_{j}; X_{i}) to the previously selected variable. In the following steps, given a set S of selected variables, the criterion updates S by choosing the variable
that maximizes the score
where u_{j }is a relevance term and r_{j }is a redundancy term. More precisely,
is the mutual information of X_{j }with the target variable Y, and
measures the average redundancy of X_{j }to each already selected variables X_{k }∈ S. At each step of the algorithm, the selected variable is expected to allow an efficient tradeoff between relevance and redundancy. It has been shown in [19] that the MRMR criterion is an optimal "pairwise" approximation of the conditional mutual information between any two genes X_{i }and X_{j }given the set S of selected variables I(X_{i}; X_{j}S).
The MRNET approach consists in repeating this selection procedure for each target gene by putting Y = X_{i }and V = X \ {X_{i}}, i = 1,...,n, where X is the set of the expression levels of all genes. For each pair {X_{i}, X_{j}}, MRMR returns two (not necessarily equal) scores s_{i }and s_{j }according to (4). The score of the pair {X_{i}, X_{j}} is then computed by taking the maximum of s_{i }and s_{j}. A specific network can then be inferred by deleting all the edges whose score lies below a given threshold I_{0 }(as in relevance networks, CLR and ARACNE). Thus, the algorithm infers an edge between X_{i }and X_{j }either when X_{i }is a wellranked predictor of X_{j }(s_{i }> I_{0}) or when X_{j }is a wellranked predictor of X_{i }(s_{j }> I_{0}).
An effective implementation of the bestfirst search for quadratic problems is available in [20]. This implementation demands an O(f × n) complexity for selecting f features using a best first search strategy. It follows that MRNET has an O(f × n^{2}) complexity since the feature selection step is repeated for each of the n genes. In other terms, the complexity ranges between O(n^{2}) and O(n^{3}) according to the value of f. In practice the selection of features stops once a variable obtains a negative score.
Implementation of the inference algorithms in minet
All the algorithms discussed above are available in the minet package. The RELNET algorithm is implemented by simply running the command build.mim which returns the MIM matrix which can be considered as a weighted adjacency matrix of the network. CLR, ARACNE and MRNET are implemented by the commands aracne(mim), clr(mim), mrnet(mim) respectively that return a weighted adjacency matrix of the network.
It should be noted, that the modularity of the minet package makes possible to assess network inference methods on similarity matrices other than MIM [21].
2 Mutual information estimation
An informationtheoretic network inference technique aims at identifying connections between two genes (variables) by estimating the amount of information common to any pair of genes. Mutual information is a measure which calculates dependencies between two discrete random variables. An important property of this measure is that it is not restricted to the identification of linear relations between the random variables [16].
If X is a continuous random variable taking values between a and b, the interval [a, b] can be discretized by partitioning it into  subintervals, called bins, where the symbol denotes the bin index vector. We use also nb(x_{k}) to denote the number of data points in the kth bin and the symbol to denote the number of samples. If X is a random vector each element X_{i }can be discretized separately into  bins with index vector .
Let X be a random vector and p a probability measure. The i, jth element of the mutual information matrix (MIM) is defined by
where the entropy of a random variable X is defined as
and I(X_{i}; X_{j}) is the mutual information between the random variables X_{i }and X_{j}.
Hence, each mutual information calculus demands the estimation of three entropy terms (Eq. 5). A fast entropy estimation is therefore essential for an effective network inference based on MI. Entropy estimation has gained much interest in feature selection and network inference over the last decade [22]. Most approaches focus on reducing the bias inherent to entropy estimation. In this section, some of the fastest and most used entropy estimators are stressed. Other interesting approaches can be found in [2226].
2.1 Empirical and MillerMadow corrected estimators
The empirical estimator (also called "plugin", "maximum likelihood" or "naïve", see [23]) is the entropy of the empirical distribution.
Note that, because of the convexity of the logarithmic function, an underestimate of p(x_{k}) causes an error on H(X = x_{k}) that is larger than the one given by an overestimation of the same quantity. As a result, entropy estimators are biased downwards, that is
It has been shown that the variance of the empirical estimator is upperbounded by which depends only on the number of samples whereas the asymptotic bias of the estimate depends also on the number of bins  [23]. As  ≫ m, this estimator can still have a low variance but the bias can become very large [23].
The MillerMadow correction is then given by the following formula which is the empirical entropy corrected by the asymptotic bias,
where  is the number of bins with nonzero probability. This correction, while adding no computational cost to the empirical estimator, reduces the bias without changing variance. As a result, the MillerMadow estimator is often preferred to the naive empirical entropy estimator.
2.2 Shrink entropy estimator
The rationale of the shrink estimator, [27], is to combine two different estimators, one with low variance and one with low bias, by using a weighting factor λ ∈ [0,1]
Shrinkage is a general technique to improve an estimator for a small sample size [3]. As the value of λ tends to one, the estimated entropy is moved toward the maximal entropy (uniform probability) whereas when λ is zero the estimated entropy tends to the value of the empirical one.
Let λ* be the value minimizing the mean square function, see [27],
It has been shown in [28] that the optimal λ is given by
2.3 The SchurmannGrassberger Estimator
The Dirichlet distribution can be used in order to estimate the entropy of a discrete random variable. The Dirichlet distribution is the multivariate generalization of the beta distribution. It is also the conjugate prior of the multinomial distribution in Bayesian statistics. More precisely, the density of a Dirichlet distribution takes the following form
where β_{i }is the prior probability of an event x_{i }and Γ(·) is the gamma function, (see [25,27,29] for more details).
In case of no a priori knowledge, the β_{k }are assumed to be equal (β_{k }= N, k ∈ ) so as no event becomes more probable than another. Note that using a Dirichlet prior with parameters N is equivalent to adding N ≥ 0 "pseudocounts" to each bin i ∈ . The prior actually provides the estimator the information that N counts have been observed in previous experiments. From that viewpoint, N becomes the a priori sample size.
The entropy of a Dirichlet distribution can be computed directly with the following equation:
Various choices of prior parameters has been proposed in the literature [2931]. Schurmann and Grassberger have proposed the prior [32] that has been retained in the package.
Implementation of estimators in minet
The mutual information matrix is estimated by using the function build.mim(dataset, estimator). This function returns a matrix of paired mutual informations computed in nats (base e) and takes two arguments:
1. the data frame dataset which stores the gene expression dataset or a generic dataset where columns contain variables/features and rows contain outcomes/samples
2. the string mi, that denotes the routine used to perform mutual information estimator.
The package makes available four estimation routines : "mi.empirical", "mi.shrink", "mi.sg","mi.mm" (default:"mi.empirical") each referring to the estimators technique explained above.
3 Discretization methods
All the estimators discussed in the previous section have been designed for discrete variables. If the random variable X is continuous and takes values comprised between a and b, it is then required to partition the interval [a, b] into  subintervals in order to adopt a discrete entropy estimator. The two most used discretizing algorithm are the equal width and the equal frequency quantization. These are explained in the next sections. Other discretization methods can be found in [3335].
3.1 Equal Width
The principle of the equal width discretization is to divide the range [a_{i}, b_{i}] of each variable X_{i}, i ∈ {1, 2,...,n} in the dataset into  subintervals of equal size: . Note that an ε is added in the last interval in order to include the maximal value in one of the  bins. This discretization scheme has a O(m) complexity cost (by variable).
3.2 Global Equal Width
The principle of the global equal width discretization is the same as the equal width (Sec. 3.1) except that the considered range [a, b] is not the range of each random variable such as in Sec. 3.1 but the range of the random vector composed of all the variables in the dataset. In other words, a and b are respectively the minimal and the maximal value of the dataset.
3.3 Equal Frequency
The equal frequency discretization scheme consists in partitioning the range [a_{i}, b_{i}] of each variable X_{i }in the dataset into  intervals, each having the same number m/ of data points points. As a result, the size of each interval can be different. Note that if the  intervals have equal frequencies, the computation of entropy is straightforward: it is log . However, there can be more than m/ identical values in a vector of measurements. In such case, one of the bins will be more dense than the others and the resulting entropy will be different of log . It should be noted that this discretization is reported in some papers as one of the most efficient method (e.g. for naive Bayes classification) [35].
Implementation of discretization strategies in minet
The discretization is performed in minet by the function
discretize(dataset, disc = "equalfreq", nbins = sqrt(nrow(dataset)))
where
• dataset is the dataset to be discretized
• disc is a string which can take three values: "equalfreq" "equalwidth" "globalequalwidth"(default is " equalfreq").
• nbins, the number of bins to be used for discretization, which is by default set to with m is the number of samples [35]. Note that there are functions used by the builtin R hist() function that can be used here such as nclass. FD(dataset), nclass. scott(dataset) and nclass. Sturges(dataset).
4 Assessment of the network inference algorithm
A network inference problem can be seen as a binary decision problem where the inference algorithm plays the role of a classifier: for each pair of nodes, the algorithm either returns an edge or not. Each pair of nodes can thus be assigned a positive label (an edge) or a negative one (no edge).
A positive label (an edge) predicted by the algorithm is considered as a true positive (TP) or as a false positive (FP) depending on the presence or not of the corresponding edge in the underlying true network, respectively. Analogously, a negative label is considered as a true negative (TN) or a false negative (FN) depending on whether the corresponding edge is present or not in the underlying true network, respectively. Note that all mutual information network inference methods use a threshold value in order to delete the arcs having a too low score. Hence, for each treshold value, a confusion matrix can be computed.
4.1 ROC curves
The false positive rate is defined as
and the true positive rate as
also known as recall or sensitivity.
A Receiver Operating Characteristic (ROC) curve, is a graphical plot of the TPR (true positive rate) vs. FPR (false positive rate) for a binary classifier system as the threshold is varied [36]. A perfect classifier would yield a point in the upper left corner (having coordinates [0,1]) of the ROC space, representing 100% TPR (all true positives are found) and 0% FPR (no false positives are found). A completely random guess gives a point along the diagonal line (the socalled line of nodiscrimination) which goes from the left bottom to the top right corners. Points above the diagonal line indicate good classification results, while points below the line indicate wrong results.
4.2 PR curves
It is generally recommended [37] to use receiver operator characteristic (ROC) curves when evaluating binary decision problems in order to avoid effects related to the chosen threshold. However, ROC curves can present an overly optimistic view of an algorithm's performance if there is a large skew in the class distribution, as typically encountered in transcriptional network inference because of sparseness. To tackle this problem, precisionrecall (PR) curves have been cited as an alternative to ROC curves [38].
Let the precision quantity
measure the fraction of real edges among the ones classified as positive and the recall quantity
also know as true positive rate (TPR), denote the fraction of real edges that are correctly inferred. These quantities depend on the threshold chosen to return a binary decision. The PR curve is a diagram which plots the precision (p) versus recall (r) for different values of the threshold on a twodimensional coordinate system.
4.3 FScores
Note that a compact representation of the PR diagram is returned by the maximum and/or the average of the Fscore quantity [39]:
which is an harmonic average of precision and recall.
The general formula for nonnegative real β is:
where β is a parameter denoting the weight of the recall. Two commonly used Fscores are the F_{2}measure, which weights recall twice as much as precision, and the F_{0.5}measure, which weights precision twice as much as recall. In transcriptional network inference, precision is often a more desirable feature than recall since it is expensive to investigate if a gene regulates another.
Assesment functionalities in minet
In order to benchmark the inference methods, the package provides a number of assessment tools. The validate(net, ref.net, steps = 50) function allows to compare an inferred network net to a reference network ref.net, described by a Boolean adjacency matrix. The assessment process consists in removing the inferred edges having a score below a given threshold and in computing the related confusion matrix, for steps thresholds ranging from the minimum to the maximum value of edge weigths. A resulting dataframe table containing the list of all the steps confusion matrices is returned and made available for further analysis.
In particular, the function pr(table) returns the related precisions and recalls, rates(table) computes true positive and false positive rates while the function fscores(table, beta) returns the F_{β }– scores. The functions show.pr(table) and show.roc(table) allow the user to plot PRcurves and ROCcurves respectively (Figure 3) from a list of confusion matrices.
Figure 3. PrecisionRecall curves plotted with show.pr(table).
5 Example
Once the R platform is launched, the package, its description and its vignette can be loaded using the following commands:
library(minet)
library(help = minet)
vignette("minet")
A demo script (demo(demo)) shows the main functionalities of the package that we describe in the following.
In order to infer a network with the minet package, four steps are required:
• data discretization,
• MIM computation,
• network inference,
• normalization of the network (optional).
The main function of the package is minet which sequentially executes the four steps mentioned above, see Figure 1).
Figure 1. The four steps in the minet function (discretization disc, mutual information matrix build.mim, inference mrnet, aracne, clr and normalization norm.
The function minet(dataset, method, estimator, disc, nbins) takes the following arguments: dataset, a matrix or a dataframe containing the microarray data, method, the inference algorithm (such as ARACNE, CLR or MRNET), estimator, the entropy estimator used for the computation of mutual information (empirical, MillerMadow, shrink, SchurmannnGrassberger), disc the binning algorithm (i.e. equal frequency or equal size interval) and the parameter nbins which sets the number of bins to use. The final step of the minet function is the normalization using the norm(net) function. This step normalizes all the weights of the inferred adjancy matrix between 0 and 1. Hence, the minet function returns the inferred network as a weighted adjacency matrix with values ranging from 0 to 1 where the higher is a weight, the higher is the evidence that a genegene interaction exists.
For demo purposes the package makes available also the dataset syn.data representing the expression of 50 genes in 100 experiments. This dataset has been synthetically generated from the network syn.net using the microarray data generator Syntren [40]. This dataset can be loaded with data(syn.data) and the corresponding original network with data(syn.net).
Note that the command res<minet(syn.data,"mrnet","mi.shrink","equalwidth",10) is a compact way to execute the following sequence of instructions:
discdata<discretize(syn.data,"equalwidth",10)
mim<build.mim(discdata,"mi.shrink")
net<mrnet(mim)
res<norm(net)
In order to plot a PRcurve (see Figure 3), the functions show.pr and validate can be used.
table < validate(res, syn.net)
show.pr(table)
In order to display the inferred network, the Rgraphviz package [41] can be used with the following commands (see Fig. 2):
Figure 2. Graph generated with minet and plotted with Rgraphviz.
library(Rgraphviz)
graph < as(res, "graphNEL")
plot(graph)
Note that, for the sake of computational efficiency, all the inference functions as well as the entropy estimators are implemented in C++. As a reference, a network of five hundreds variables may be inferred in less than one minute on an Intel Pentium 4 with 2 Ghz and 512 DDR SDRAM.
6 Conclusion
Transcriptional network inference is a key issue toward the understanding of the relationships between the genes of an organism. Notwithstanding, few public domain tools are available once a thourough comparison of existing approaches is at stake. A new R/Bioconductor package, freely available, has been introduced in this paper. This package makes available to biologists and bioinformatics practicioneers a set of tools to infer networks from microarray datasets with a large number (several thousands) of genes. Four informationtheoretic methods of network inference (i.e. Relevance Networks, CLR, ARACNE and MRNET), four different entropy estimators (i.e. empirical, MillerMadow, SchurmannGrassberger and shrink) and three validation tools (i.e. Fscores, PR curves and ROC curves) are implemented in the package. We deem that this tool is an effective answer to the increasing need of comparative tools in the growing domain of transcriptional network inference from expression data.
Authors' contributions
PEM and FL carried out the implementation of the R package minet (up to version 1.1.6). PEM and GB have written the package documentation as well as the manuscript. All authors read and approved the final version of the manuscripts.
Availability and requirements
The Rpackage minet is freely available from the Comprehensive R Archive Network (CRAN) at http://cran.rproject.org webcite as well as from the Bioconductor website http://bioconductor.org webcite. The package runs on Linux, Mac OS and MS Windows using an installed version of R.
Table 1. Available functions of the package minet (version 1.1.6)
Acknowledgements
This work was partially funded by the Communauté Française de Belgique under ARC grant no. 04/09307. The authors thank their collegue Catharina Olsen for her appreciable comments, suggestions and testing of package functionalities. The authors also thank Korbinian Strimmer as well as the reviewers for their useful comments on the package and the paper.
References

van Someren EP, Wessels LFA, Backer E, Reinders MJT: Genetic network modeling.
Pharmacogenomics 2002, 3(4):507525. PubMed Abstract  Publisher Full Text

Gardner TS, Faith J: Reverseengineering transcription control networks.

Schäfer J, Strimmer K: An empirical Bayes approach to inferring largescale gene association networks.
Bioinformatics 2005, 21(6):754764. PubMed Abstract  Publisher Full Text

Faith J, Hayete B, Thaden J, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins J, Gardner T: LargeScale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles.
PLoS Biology 2007., 5 PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Basso K, Margolin A, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells.
Nature Genetics 2005., 37 PubMed Abstract  Publisher Full Text

Butte AJ, PT , Slonim D, Golub T, Kohane I: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks.
Proceedings of the National Academy of Sciences 2000, 97(22):1218212186. Publisher Full Text

Butte AJ, Kohane IS: Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurments.
Pac Symp Biocomput 2000, 418429. PubMed Abstract

Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Favera RD, Califano A: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context.
BMC Bioinformatics 2006., 7 PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Meyer PE, Kontos K, Lafitte F, Bontempi G: InformationTheoretic Inference of Large Transcriptional Regulatory Networks.
EURASIP J Bioinform Syst Biol 2007, 79879. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gentleman RIR: R: A language for data analysis and graphics. [http://www.Rproject.org] webcite

Venables WN, Ripley BD: Modern Applied Statistics with S. Fourth edition. Springer; 2002.

Gentleman RC, Carey VJ, Bates DJ, Bolstad BM, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth GK, Tierney L, Yang YH, Zhang J: Bioconductor: Open software development for computational biology and bioinformatics.
Genome Biology 2004., 5 PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Cheng J, Greiner R, Kelly J, Bell D, Liu W: Learning Bayesian Networks from Data: An InformationTheory Based Approach.

Chow C, Liu C: Approximating discrete probability distributions with dependence trees.

Pearl J: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc; 1988.

Cover TM, Thomas JA: Elements of Information Theory. New York: John Wiley; 1990.

Tourassi GD, Frederick ED, Markey MK, C E, Floyd J: Application of the mutual information criterion for feature selection in computeraided diagnosis.
Medical Physics 2001, 28(12):23942402. PubMed Abstract  Publisher Full Text

Peng H, Long F, Ding C: Feature selection based on mutual information: criteria of maxdependency, maxrelevance, and minredundancy.
IEEE Transactions on Pattern Analysis and Machine Intelligence 2005, 27(8):12261238. Publisher Full Text

Ding C, Peng H: Minimum Redundancy Feature Selection From Microarray Gene Expression Data.
Journal of Bioinformatics and Computational Biology 2005, 3(2):185205. Publisher Full Text

Merz P, Freisleben B: Greedy and Local Search Heuristics for Unconstrained Binary Quadratic Programming.
Journal of Heuristics 2002, 8(2):13811231. Publisher Full Text

Olsen C, Meyer PE, Bontempi G: On the Impact of Entropy Estimator in Transcriptional Regulatory Network Inference. In 5th International Workshop on Computational Systems Biology (WSCB 08). Edited by Ahdesmäki M, Strimmer K, Radde N, Rahnenf hrer J, Klemm K, L hdesm ki H, YliHarja O. Tampere International Center for Signal Processing; 2008:41.

Daub CO, Steuer R, Selbig J, Kloska S: Estimating mutual information using Bspline functions – an improved similarity measure for analysing gene expression data.
BMC Bioinformatics 2004., 5 PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Paninski L: Estimation of entropy and mutual information.
Neural Computation 2003, 15(6):11911253. Publisher Full Text

Beirlant J, Dudewica EJ, Gyofi L, Meulen E: Nonparametric Entropy Estimation: An Overview.

Nemenman I, Bialek W, de Ruyter van Steveninck R: Entropy and information in neural spike trains: Progress on the sampling problem.
Phys Rev E Stat Nonlin Soft Matter Phys 2004, 69(5 Pt 2):056111. PubMed Abstract  Publisher Full Text

Darbellay G, Vajda I: Estimation of the information by an adaptive partitioning of the observation space.

Hausser J: Improving entropy estimation and inferring genetic regulatory networks. [http://strimmerlab.org/publications/mschausser.pdf] webcite
Master's thesis National Institute of Applied Sciences Lyon; 2006.

Schäfer J, Strimmer K: A shrinkage approach to largescale covariance matrix estimation and implications for functional genomics.

Wu L, Neskovic P, Reyes E, Festa E, Heindel W: Classifying nback EEG data using entropy and mutual information features.

Beerenwinkel N, Schmidt B, Walter H, Kaiser R, Lengauer T, Hoffmann D, Korn K, Selbig J: Diversity and complexity of HIV1 drug resistance: A bioinformatics approach to predicting phenotype from genotype.
Proc Natl Acad Sci U S A 2002, 99(12):82718276. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Krichevsky R, Trofimov V: The performance of universal coding.

Schurmann T, Grassberger P: Entropy estimation of symbol sequences.

Dougherty J, Kohavi R, Sahami M: Supervised and Unsupervised Discretization of Continuous Features.

Liu H, Hussain F, Tan CL, Dash M: Discretization: An Enabling Technique.

Yang Y, Webb GI: On why discretization works for naivebayes classifiers.
Proceedings of the 16th Australian Joint Conference on Artificial Intelligence 2003.

Davis J, Goadrich M: The Relationship Between PrecisionRecall and ROC Curves.
Proceedings of the 23rd international conference on Machine learning 2006.

Provost F, Fawcett T, Kohavi R: The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA; 1998:445453.

Bockhorst J, Craven M: Markov Networks for Detecting Overlapping Elements in Sequence Data. In Advances in Neural Information Processing Systems 17. Edited by Saul LK, Weiss Y, Bottou L. Cambridge, MA: MIT Press; 2005:193200.

Sokolova M, Japkowicz N, Szpakowicz S: Beyond Accuracy, Fscore and ROC: a Family of Discriminant Measures for Performance Evaluation.
Proceedings of the AAAI'06 workshop on Evaluation Methods for Machine Learning 2006.

den Bulcke TV, Leemput KV, Naudts B, van Remortel P, Ma H, Verschoren A, Moor BD, Marchal K: SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms.
BMC Bioinformatics 2006, 7:43. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Carey VJ, Gentry J, Whalen E, Gentleman R: Network Structures and Algorithms in Bioconductor.
Bioinformatics 2005, 21:135136. PubMed Abstract  Publisher Full Text