Marked variation in predicted and observed variability of tandem repeat loci across the human genome1 Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin 2, Ireland 2 UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
BMC Genomics 2008, 9:175doi:10.1186/1471-2164-9-175
Additional filesAdditional file 1: ROC curves illustrating the behaviour of the different models. Each point corresponds to a threshold dividing predictions from the model into variants or non-variants. These predictions were then compared to the original WGS estimate of repeat variability. "Generic" represents predictions from the model trained on all the data. "Exonic" represents the model trained on repeats only within exons. "Combined" represents predictions taken for each repeat that were derived from a specific model for that repeat, i.e. for all dimer repeats, the prediction from a model trained on all dimers in the entire dataset was taken. These length-specific models were derived for 2-,3-,4-,5-,6- and 7–12 mer repeats and then combined. Format: TIFF Size: 154KB Download file Additional file 2: 209,214 repeats searched against WGS sequences. Data is presented for chromosome, start position, stop position, sequence of the tandem repeat unit, repeat unit copy-number, repeat block length, number of unique repeat block lengths, heterozygosity of the repeat, the number of times each unique repeat block length arises from the search against the WGS sequences, unique repeat id. Format: ZIP Size: 4.2MB Download file Additional file 3: Distribution of heterozygosity for repeats searched against the WGS dataset. Format: TIFF Size: 32KB Download file Additional file 4: Summaries of the different stepwise logistic and linear regression models tested when modelling all covariates. The Pseudo R2 and R2 are used here as estimates of model fit. For all models, popsize is used to weight the data and only repeats with popsize < = 12 are modelled. Format: DOC Size: 35KB Download file This file can be viewed with: Microsoft Word Viewer Additional file 5: PPR script. Distribution of program developed to predict potentially polymorphic. repeats. Format: ZIP Size: 1.3MB Download file Additional file 6: Summary of models using all covariates. Format: DOC Size: 142KB Download file This file can be viewed with: Microsoft Word Viewer Additional file 7: Mean ratio of variant to invariant repeats over all chromosomes. Standard deviations from this mean (calculated in windows of 250 Mb) are shown as error bars. Format: TIFF Size: 182KB Download file Additional file 8: Genomic distribution of density of different repeat types and of mean popsize. Density of A repeats, B variant repeats and C predicted variants over all chromosomes. Density is calculated as the sum total of non-gapped, non-telomeric sequence divided by the number of observations for each repeat type and is thus lower when more observations are made. The distribution of mean popsize (D) is also shown. Format: TIFF Size: 257KB Download file Additional file 9: The fraction of different length repeats per chromosome. For each chromosome, the fraction is the count of each repeat type divided by the total number of 2–6-mer repeats on that chromosome. Format: TIFF Size: 249KB Download file Additional file 10: UCSC browser custom track information for detected tandem repeat variants. Format: ZIP Size: 1.5MB Download file |




on Google Scholar







author email
corresponding author email