Open Access Highly Accessed Research article

Characterizing the genetic differences between two distinct migrant groups from Indo-European and Dravidian speaking populations in India

Mohammad Ali1, Xuanyao Liu23, Esakimuthu Nisha Pillai2, Peng Chen2, Chiea-Chuen Khor4, Rick Twee-Hee Ong2 and Yik-Ying Teo123456*

Author Affiliations

1 Life Sciences Institute, National University of Singapore, Singapore, Singapore

2 Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore

3 NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, Singapore

4 Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore

5 Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore

6 Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Blk S16, Level 7, 6 Science Drive 2, Singapore 117546, Singapore

For all author emails, please log on.

BMC Genetics 2014, 15:86  doi:10.1186/1471-2156-15-86

Published: 22 July 2014

Abstract

Background

India is home to many ethnically and linguistically diverse populations. It is hypothesized that history of invasions by people from Persia and Central Asia, who are referred as Aryans in Hindu Holy Scriptures, had a defining role in shaping the Indian population canvas. A shift in spoken languages from Dravidian languages to Indo-European languages around 1500 B.C. is central to the Aryan Invasion Theory. Here we investigate the genetic differences between two sub-populations of India consisting of: (1) The Indo-European language speaking Gujarati Indians with genome-wide data from the International HapMap Project; and (2) the Dravidian language speaking Tamil Indians with genome-wide data from the Singapore Genome Variation Project.

Results

We implemented three population genetics measures to identify genomic regions that are significantly differentiated between the two Indian populations originating from the north and south of India. These measures singled out genomic regions with: (i) SNPs exhibiting significant variation in allele frequencies in the two Indian populations; and (ii) differential signals of positive natural selection as quantified by the integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH). One of the regions that emerged spans the SLC24A5 gene that has been functionally shown to affect skin pigmentation, with a higher degree of genetic sharing between Gujarati Indians and Europeans.

Conclusions

Our finding points to a gene-flow from Europe to north India that provides an explanation for the lighter skin tones present in North Indians in comparison to South Indians.

Keywords:
Positive selection; Long haplotype; Population diversity