Log on / register
Feedback | Support | My details
Open AccessResearch article

Marked variation in predicted and observed variability of tandem repeat loci across the human genome

Colm T O'Dushlaine1 email and Denis C Shields1,2 email

Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin 2, Ireland

UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland

author email corresponding author email

BMC Genomics 2008, 9:175doi:10.1186/1471-2164-9-175

Published: 16 April 2008

Additional files

Additional file 1:

ROC curves illustrating the behaviour of the different models. Each point corresponds to a threshold dividing predictions from the model into variants or non-variants. These predictions were then compared to the original WGS estimate of repeat variability. "Generic" represents predictions from the model trained on all the data. "Exonic" represents the model trained on repeats only within exons. "Combined" represents predictions taken for each repeat that were derived from a specific model for that repeat, i.e. for all dimer repeats, the prediction from a model trained on all dimers in the entire dataset was taken. These length-specific models were derived for 2-,3-,4-,5-,6- and 7–12 mer repeats and then combined.

Format: TIFF Size: 154KB Download file

Additional file 2:

209,214 repeats searched against WGS sequences. Data is presented for chromosome, start position, stop position, sequence of the tandem repeat unit, repeat unit copy-number, repeat block length, number of unique repeat block lengths, heterozygosity of the repeat, the number of times each unique repeat block length arises from the search against the WGS sequences, unique repeat id.

Format: ZIP Size: 4.2MB Download file

Additional file 3:

Distribution of heterozygosity for repeats searched against the WGS dataset.

Format: TIFF Size: 32KB Download file

Additional file 4:

Summaries of the different stepwise logistic and linear regression models tested when modelling all covariates. The Pseudo R2 and R2 are used here as estimates of model fit. For all models, popsize is used to weight the data and only repeats with popsize < = 12 are modelled.

Format: DOC Size: 35KB Download file

This file can be viewed with: Microsoft Word Viewer

Additional file 5:

PPR script. Distribution of program developed to predict potentially polymorphic. repeats.

Format: ZIP Size: 1.3MB Download file

Additional file 6:

Summary of models using all covariates.

Format: DOC Size: 142KB Download file

This file can be viewed with: Microsoft Word Viewer

Additional file 7:

Mean ratio of variant to invariant repeats over all chromosomes. Standard deviations from this mean (calculated in windows of 250 Mb) are shown as error bars.

Format: TIFF Size: 182KB Download file

Additional file 8:

Genomic distribution of density of different repeat types and of mean popsize. Density of A repeats, B variant repeats and C predicted variants over all chromosomes. Density is calculated as the sum total of non-gapped, non-telomeric sequence divided by the number of observations for each repeat type and is thus lower when more observations are made. The distribution of mean popsize (D) is also shown.

Format: TIFF Size: 257KB Download file

Additional file 9:

The fraction of different length repeats per chromosome. For each chromosome, the fraction is the count of each repeat type divided by the total number of 2–6-mer repeats on that chromosome.

Format: TIFF Size: 249KB Download file

Additional file 10:

UCSC browser custom track information for detected tandem repeat variants.

Format: ZIP Size: 1.5MB Download file


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.