Phylogeography of mtDNA haplogroup R7 in the Indian peninsula

Gyaneshwer Chaubey12, Monika Karmin1, Ene Metspalu1, Mait Metspalu1, Deepa Selvi-Rani2, Vijay Kumar Singh2, Jüri Parik1, Anu Solnik1, B Prathap Naidu2, Ajay Kumar25, Niharika Adarsh25, Chandana Basu Mallick25, Bhargav Trivedi25, Swami Prakash25, Ramesh Reddy25, Parul Shukla25, Sanjana Bhagat25, Swati Verma25, Samiksha Vasnik25, Imran Khan25, Anshu Barwa25, Dipti Sahoo25, Archana Sharma25, Mamoon Rashid25, Vishal Chandra25, Alla G Reddy2, Antonio Torroni3, Robert A Foley4, Kumarasamy Thangaraj2, Lalji Singh2, Toomas Kivisild14* and Richard Villems1

1 Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia

2 Centre for Cellular and Molecular Biology, Hyderabad, India

3 Dipartimento di Genetica e Microbiologia, Università di Pavia, Via Ferrata 1, 27100 Pavia, Italy

4 Leverhulme Centre of Human Evolutionary Studies, The Henry Wellcome Building, University of Cambridge, Fitzwilliam Street, Cambridge, CB2 1QH, UK

5 Students of different Universities and Colleges of India studied (as a part of their curriculum) in CCMB Hyderabad, India

BMC Evolutionary Biology 2008, 8:227  doi:10.1186/1471-2148-8-227

Published: 4 August 2008



Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.


Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.


Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.