Open Access Highly Accessed Research article

Autosomal and uniparental portraits of the native populations of Sakha (Yakutia): implications for the peopling of Northeast Eurasia

Sardana A Fedorova12, Maere Reidla2*, Ene Metspalu2, Mait Metspalu3, Siiri Rootsi3, Kristiina Tambets3, Natalya Trofimova4, Sergey I Zhadanov5, Baharak Hooshiar Kashani6, Anna Olivieri6, Mikhail I Voevoda7, Ludmila P Osipova8, Fedor A Platonov9, Mikhail I Tomsky1, Elza K Khusnutdinova104, Antonio Torroni6 and Richard Villems1123

Author Affiliations

1 Department of Molecular Genetics, Yakut Research Center of Complex Medical Problems, Russian Academy of Medical Sciences and North-Eastern Federal University, Yakutsk, Russia

2 Department of Evolutionary Biology, University of Tartu, Tartu, Estonia

3 Estonian Biocentre, Tartu, Estonia

4 Institute of Biochemistry and Genetics, Ufa Scientific Center, Russian Academy of Sciences, Ufa, Russia

5 Department of Anthropology, University of Pennsylvania, Philadelphia, USA

6 Dipartimento di Biologia e Biotecnologie, Università di Pavia, Pavia, Italy

7 Institute of Internal Medicine, Siberian Branch of Russian Academy of Medical Sciences, Novosibirsk, Russia

8 Institute of Genetics and Cytology, Siberian Branch of Russian Academy of Sciences, Novosibirsk, Russia

9 Institute of Health, North-East Federal University, Yakutsk, Russia

10 Department of Genetics and Fundamental Medicine, Bashkir State University, Ufa, Russia

11 Estonian Academy of Sciences, Tallinn, Estonia

For all author emails, please log on.

BMC Evolutionary Biology 2013, 13:127  doi:10.1186/1471-2148-13-127

Published: 19 June 2013

Additional files

Additional file 1:

Genotyping information for 701 mtDNAs from five native populations of Sakha and 128 mtDNAs from Dolgans in Taymyr.

Format: XLSX Size: 76KB Download file

Open Data

Additional file 2:

Phylogenetic tree based on 80 complete mtDNA sequences from haplogroup Z. Mutations relative to the RSRS [35] are indicated on the branches. Capital letters are used for transitions and lowercase letters for transversions. Heteroplasmies are labeled using the IUPAC code and capital letters (e.g., 73R). Recurrent mutations are underlined. Reversal mutations are suffixed with “!”. Insertions are indicated by a dot followed by the position number and type of inserted nucleotide(s). Deletions are indicated by a “d” after the deleted nucleotide position. The control-region sequence is not reported for the sample As30. For phylogeny construction, the highly variable site 16519 and the length variation in the poly-C stretches at nps 303-315 and 16184-16194 were not used. A-C transversions at nps 16182 and 16183 were excluded because of their dependence on the presence of the C-T transition at np 16189. The box containing the sample ID is color coded according to the geographic origin of the sample, and the accession number and/or the publication from which it was retrieved is denoted below the ID. Coalescence time estimates expressed in kilo years ago are shown next to clade labels and were calculated based on the rho statistic and standard deviation as in [59,90]. The calculator provided by [91] was used to convert the rho statistics and its error ranges to age estimates with 95% confidence intervals. Sample As30 was excluded from the calculations, as its control region is not reported.

Format: XLSX Size: 35KB Download file

Open Data

Additional file 3:

Population frequencies of mtDNA Z sub-haplogroups.

Format: XLSX Size: 15KB Download file

Open Data

Additional file 4:

Phylogenetic tree based on 37 complete mtDNA sequences from haplogroup R1. Mutations relative to the RSRS [35] are indicated on the branches. Capital letters are used for transitions and lowercase letters for transversions. Heteroplasmies are labeled using the IUPAC code and capital letters (e.g., 73R). Recurrent mutations are underlined. Reversal mutations are suffixed with “!”. Insertions are indicated by a dot followed by the position number and type of inserted nucleotide(s). Deletions are indicated by a “d” after the deleted nucleotide position. For phylogeny construction, the length variation in the poly-C stretches at nps 303-315 and 16184-16194 was not used. A-C transversions at nps 16182 and 16183 were excluded because of their dependence on the presence of the C-T transition at np 16189. The box containing the sample ID is color coded according to the geographic origin of the sample, and below it the accession number and/or the publication from which it was retrieved is denoted. Coalescence time estimates expressed in kilo years ago are shown next to clade labels and were calculated based on the rho statistic and standard deviation as in [59,90]. The calculator provided by [91] was used to convert the rho statistics and its error ranges to age estimates with 95% confidence intervals. Sample Azeri10 was excluded from the calculations because of multiple heteroplasmic sites in the sequence.

Format: XLSX Size: 26KB Download file

Open Data

Additional file 5:

Frequencies of Y-STR haplotypes in the native populations of Sakha. Designations of populations are as in Figure 1. Sample sizes are given in parentheses.

Format: XLSX Size: 13KB Download file

Open Data

Additional file 6:

Y-STR haplotypes of 398 samples from haplogroup N1c.

Format: XLSX Size: 17KB Download file

Open Data

Additional file 7:

Y-STR haplotypes of haplogroup C3* in the native populations of Sakha.

Format: XLSX Size: 10KB Download file

Open Data

Additional file 8:

Phylogenetic network of the Y-chromosome haplogroup C3*. This median joining network of C3* haplotypes was constructed by employing STR data (11 loci: DYS385a, DYS385b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438 and DYS439) from 121 individuals using the program Network 4.6.1.0. Circles represent microsatellite haplotypes, the areas of the circles and sectors are proportional to haplotype frequencies according to the data presented in Additional file 9. Populations from Sakha and the linguistic affiliations of the rest of the samples are indicated by color.

Format: PDF Size: 87KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 9:

Y-STR haplotypes of 121 samples from haplogroup C3*.

Format: XLSX Size: 68KB Download file

Open Data

Additional file 10:

Details of samples included in autosomal SNP data analyses.

Format: XLSX Size: 19KB Download file

Open Data

Additional file 11:

ADMIXTURE plots from K = 2 to K = 14. At each K the run with the highest log-likelihood of 100 runs is plotted. Each vertical column corresponds to one sample and represents its probability to have ancestry in the constructed ancestral populations differentiated by colors.

Format: PDF Size: 1.1MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 12:

ADMIXTURE analysis from K = 2 to K = 14. a) log-likelihood scores (LLs) of all the 14 × 100 runs of ADMIXTURE. Inset shows the extent of this variation in the fractions (5%, 10%, 20%) of runs that reached the highest LLs. b) Box and whiskers plot of the cross validation indexes of all 1400 runs of ADMIXTURE.

Format: PDF Size: 116KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 13:

ADMIXTURE plots at K=13. Ten runs with the highest log-likelihood were plotted.

Format: PDF Size: 1.4MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 14:

Correlation and partial correlation coefficients, r (P-values), between genetic, geographic, and linguistic distances.

Format: XLSX Size: 12KB Download file

Open Data

Additional file 15:

Additional information on the mtDNA and Y chromosome data used in the Mantel test.

Format: XLSX Size: 14KB Download file

Open Data

Additional file 16:

Additional information on the native populations of Sakha.

Format: PDF Size: 371KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data