Exome capture from saliva produces high quality genomic and metagenomic data
- Equal contributors
1 Department of Genetics, Stanford University, Stanford, CA 94305, USA
2 Departments of Human Genetics, and Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
3 The J. David Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
4 Departments of Microbiology, and Statistics, Oregon State University, Corvallis, OR 97331, USA
5 Department of Ecology and Evolution, Stony Brook University, Life Sciences Bldg, Room 640, Stony Brook, NY 11794, USA
6 Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
7 Program in Pharmaceutical Sciences and Pharmacogenomics, University of California, San Francisco, CA 94143, USA
8 Agilent Technologies, Genomics Division, Cedar Creek, TX 78612, USA
9 Translational Medicine, BGI – Shenzhen, Shenzhen, China
10 Stellenbosch University, Tygerberg, South Africa
11 Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA
12 Institute for Human Genetics, and the Departments of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA 94143, USA
BMC Genomics 2014, 15:262 doi:10.1186/1471-2164-15-262Published: 4 April 2014
Targeted capture of genomic regions reduces sequencing cost while generating higher coverage by allowing biomedical researchers to focus on specific loci of interest, such as exons. Targeted capture also has the potential to facilitate the generation of genomic data from DNA collected via saliva or buccal cells. DNA samples derived from these cell types tend to have a lower human DNA yield, may be degraded from age and/or have contamination from bacteria or other ambient oral microbiota. However, thousands of samples have been previously collected from these cell types, and saliva collection has the advantage that it is a non-invasive and appropriate for a wide variety of research.
We demonstrate successful enrichment and sequencing of 15 South African KhoeSan exomes and 2 full genomes with samples initially derived from saliva. The expanded exome dataset enables us to characterize genetic diversity free from ascertainment bias for multiple KhoeSan populations, including new exome data from six HGDP Namibian San, revealing substantial population structure across the Kalahari Desert region. Additionally, we discover and independently verify thirty-one previously unknown KIR alleles using methods we developed to accurately map and call the highly polymorphic HLA and KIR loci from exome capture data. Finally, we show that exome capture of saliva-derived DNA yields sufficient non-human sequences to characterize oral microbial communities, including detection of bacteria linked to oral disease (e.g. Prevotella melaninogenica). For comparison, two samples were sequenced using standard full genome library preparation without exome capture and we found no systematic bias of metagenomic information between exome-captured and non-captured data.
DNA from human saliva samples, collected and extracted using standard procedures, can be used to successfully sequence high quality human exomes, and metagenomic data can be derived from non-human reads. We find that individuals from the Kalahari carry a higher oral pathogenic microbial load than samples surveyed in the Human Microbiome Project. Additionally, rare variants present in the exomes suggest strong population structure across different KhoeSan populations.