BMC Proceedings


This article is part of the supplement: Proceedings of the 12th European workshop on QTL mapping and marker assisted selection

Open Access Proceedings

Data modeling as a main source of discrepancies in single and multiple marker association methods

Mônica C Ledur1,2*, Nicolas Navarro2 and Miguel Pérez-Enciso2,3

Author Affiliations

1 Embrapa Suínos e Aves, BR 153, Km 110, 89700-000, Concórdia, SC, Brazil

2 Dept. Ciencia Animal i dels Aliments, Facultat de Veterinaria, Universitat Autonoma de Barcelona, 08193, Bellaterra, Spain

3 Institut Català de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain

For all author emails, please log on.

BMC Proceedings 2009, 3(Suppl 1):S9 doi:

Published: 23 February 2009

Additional files

Additional file 1:

SNPs associated to the phenotype by SMA and Blossoc methods using raw data. Threshold for SMA is P < 10-8 and for Blossoc is HQ Score ≥ 15. 1 – Bootstrap posterior probabilities of 1000 models for SMA raw of the 33 SNPs that pass the threshold of -log10(p) ≥ 8. * – Significant associations, considering a BPP > 0.25 for SMA raw, decreased the number of associated SNPs to 15. Considering an adjusted threshold for Blossoc raw ≥ 65, to account for the inflation caused by the population structure, the number of significant associated SNPs decreased to 19.

Format: PDF Size: 44KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Total CPU time required for the analyses, including the significance tests. 1 – a)1st step: 600 bins (179,700 interactions * 4 tests); b) 2nd step: top interactions with -log10(p) >3 for 675 pairwise locations to refine (67,500 interactions: 4 tests with an average of 169 pairwise locations to refine within 1 cM). 2 – Correction with mixed model including sex, generation and infinitesimal effects using QxPak. Note: Model aggregation using 1000 bootstrap samples on the 33 putative QTLs of the SMA raw took 4 h 37 m 46 s (Additional file 1). Analyses were performed on a Linux server with dual Xeon processors and 8 Gb RAM. From the programs used, only Blossoc was specifically dedicated to GWAS. The R-Scripts for SMA, bagging and epistasis were not optimized to speed-up calculation. Qxpak is a program initially dedicated to QTL mapping in livestock. SMA on raw data outperformed the two other approaches in the initial genome-scan. Blossoc is very fast considering a haplotype-based method. QxPak ran in a reasonable time considering its internal correction for the population structure. However, this time precludes using computationally intensive procedures to control false positives. The bagging took about 4 h 38 m (33 candidate SNPs, 1000 models, on average 13.4 SNPs per model). The proper calibration of the BPP will require S more times (S = number of simulations done for calibration). Nevertheless, strong code optimization will be straightforward and will dramatically reduce computing time. The two-step strategy for epistasis was reasonably efficient given the large number of interaction terms that were tested.

Format: DOC Size: 20KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 3:

Methods for Interactions within and between loci and with sex. Analyses of the interaction within locus (dominance), between loci (epistasis) and with sex were performed using only the corrected SMA procedure implemented on an R script. We used residual data from a mixed model including sex, generation and pedigree. This approximation is necessary given the computational burden of between loci interactions. For dominance and sex, we tested the effects of dominance or of the interaction between additive and sex against a model incorporating only an additive effect. No procedure for controlling false positives was done and only best results with strong -log10(p) are reported here. For the epistasis, a two step strategy based on LD between SNPs was used. First, we subdivided the genome in 1 cM bins, and tested epistasis between the bins. In each bin, the SNP with the highest average LD (r2) with all the other SNPs in the bin was chosen as the representative SNP of the bin (tagSNP) and used for the interaction testing. Cockerham parameterization of the model was used [11] and all four possible interactions (axa, axd, dxa, dxd) were tested against a reduced model incorporating only the additive and dominance effect. Then, for each type of interaction, the ones with a -log10(p) ≥ 3 were selected in order to refine the location of the interacting loci. We refined these locations by evaluating interactions between all SNPs contained in the two bins initially detected to have some significant interactions. Further, we ranked the results for each type of interactions according to their significance value. Nonetheless, we control potential LD effect by removing SNPs in a minimal distance of 10 cM from the top ranked list (-log10(p) ≥ 6).

Format: DOC Size: 21KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data