Open Access Highly Accessed Methodology article

An entropy test for single-locus genetic association analysis

Manuel Ruiz-Marín1*, Mariano Matilla-García3, José Antonio García Cordoba1, Juan Luis Susillo-González2, Alejandro Romo-Astorga2, Antonio González-Pérez2, Agustín Ruiz2 and Javier Gayán2

Author Affiliations

1 Department of Quantitative Methods, Technical University of Cartagena, Paseo Alfonso XIII, 50, 30203, Cartagena, Spain

2 Department of Structural Genomics, Neocodex, Avenida Charles Darwin, 6, 41092 Sevilla, Spain

3 Department of Quantitative Economy I, UNED, Senda del Rey 11, 28040, Madrid, Spain

For all author emails, please log on.

BMC Genetics 2010, 11:19  doi:10.1186/1471-2156-11-19

Published: 23 March 2010



The etiology of complex diseases is due to the combination of genetic and environmental factors, usually many of them, and each with a small effect. The identification of these small-effect contributing factors is still a demanding task. Clearly, there is a need for more powerful tests of genetic association, and especially for the identification of rare effects


We introduce a new genetic association test based on symbolic dynamics and symbolic entropy. Using a freely available software, we have applied this entropy test, and a conventional test, to simulated and real datasets, to illustrate the method and estimate type I error and power. We have also compared this new entropy test to the Fisher exact test for assessment of association with low-frequency SNPs. The entropy test is generally more powerful than the conventional test, and can be significantly more powerful when the genotypic test is applied to low allele-frequency markers. We have also shown that both the Fisher and Entropy methods are optimal to test for association with low-frequency SNPs (MAF around 1-5%), and both are conservative for very rare SNPs (MAF<1%)


We have developed a new, simple, consistent and powerful test to detect genetic association of biallelic/SNP markers in case-control data, by using symbolic dynamics and symbolic entropy as a measure of gene dependence. We also provide a standard asymptotic distribution of this test statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. The entropy test is generally as good or even more powerful than the conventional and Fisher tests. Furthermore, the entropy test is more computationally efficient than the Fisher's Exact test, especially for large number of markers. Therefore, this entropy-based test has the advantage of being optimal for most SNPs, regardless of their allele frequency (Minor Allele Frequency (MAF) between 1-50%). This property is quite beneficial, since many researchers tend to discard low allele-frequency SNPs from their analysis. Now they can apply the same statistical test of association to all SNPs in a single analysis., which can be especially helpful to detect rare effects.