Genetical genomics: use all data
1 Departament of Food and Animal Science, Veterinary School, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
2 Institut Català de Recerca i Estudis Avançats, 08010 Barcelona, Spain
3 Artificial Intelligence Center, University of Oviedo at Gijón, 33271 Gijón, Spain
BMC Genomics 2007, 8:69 doi:10.1186/1471-2164-8-69Published: 12 March 2007
Genetical genomics is a very powerful tool to elucidate the basis of complex traits and disease susceptibility. Despite its relevance, however, statistical modeling of expression quantitative trait loci (eQTL) has not received the attention it deserves. Based on two reasonable assertions (i) a good model should consider all available variables as potential effects, and (ii) gene expressions are highly interconnected, we suggest that an eQTL model should consider the rest of expression levels as potential regressors, in addition to the markers.
It is shown that power can be increased with this strategy. We also show, using classical statistical and support vector machines techniques in a reanalysis of public data, that the external transcripts, i.e., transcripts other than the one being analysed, explain on average much more variability than the markers themselves. The presence of eQTL hotspots is reassessed in the light of these results.
Model choice is a critical yet neglected issue in genetical genomics studies. Although we are far from having a general strategy for model choice in this area, we can at least propose that any transcript level is scanned not only for the markers genotyped but also for the rest of gene expression levels. Some sort of stepwise regression strategy can be used to select the final model.