Large-scale inference of the point mutational spectrum in human segmental duplications
1 Centre for Molecular Biology and Neuroscience, Institute of Medical Microbiology, Rikshospitalet University Hospital, NO-0027 Oslo, Norway
2 Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316 Oslo, Norway
3 Department of Tumor Biology, Institute for Cancer Research, Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway
4 Department of Medical Informatics, Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway
BMC Genomics 2009, 10:43 doi:10.1186/1471-2164-10-43Published: 22 January 2009
Recent segmental duplications are relatively large (≥ 1 kb) genomic regions of high sequence identity (≥ 90%). They cover approximately 4–5% of the human genome and play important roles in gene evolution and genomic disease. The DNA sequence differences between copies of a segmental duplication represent the result of various mutational events over time, since any two duplication copies originated from the same ancestral DNA sequence. Based on this fact, we have developed a computational scheme for inference of point mutational events in human segmental duplications, which we collectively term duplication-inferred mutations (DIMs). We have characterized these nucleotide substitutions by comparing them with high-quality SNPs from dbSNP, both in terms of sequence context and frequency of substitution types.
Overall, DIMs show a lower ratio of transitions relative to transversions than SNPs, although this ratio approaches that of SNPs when considering DIMs within most recent duplications. Our findings indicate that DIMs and SNPs in general are caused by similar mutational mechanisms, with some deviances at the CpG dinucleotide. Furthermore, we discover a large number of reference SNPs that coincide with computationally inferred DIMs. The latter reflects how sequence variation in duplicated sequences can be misinterpreted as ordinary allelic variation.
In summary, we show how DNA sequence analysis of segmental duplications can provide a genome-wide mutational spectrum that mirrors recent genome evolution. The inferred set of nucleotide substitutions represents a valuable complement to SNPs for the analysis of genetic variation and point mutagenesis.