Open Access Research article

Impact of constitutional copy number variants on biological pathway evolution

Maria Poptsova1, Samprit Banerjee2, Omer Gokcumen3, Mark A Rubin1 and Francesca Demichelis145*

Author Affiliations

1 Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY, USA

2 Department of Public Health, Weill Cornell Medical College, New York, NY, USA

3 Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA

4 Centre for Integrative Biology, University of Trento, Trento, Italy

5 Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, USA

For all author emails, please log on.

BMC Evolutionary Biology 2013, 13:19  doi:10.1186/1471-2148-13-19

Published: 23 January 2013

Additional files

Additional file 1:

Supplementary Material.

Format: DOC Size: 101KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

Table S1. Size-dependent enrichment analysis results for all the pathways considered in the study.

Format: XLS Size: 298KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 3:

Figure S1. Distribution of depleted KEGG and Biocarta pathway classes. A. Distribution of depleted KEGG classes. Original distribution of pathway classes in KEGG database is given on the left. Distribution of the KEGG classes depleted in CNVs that were obtained with the size-dependent enrichment analysis is given on the right. B. Distribution of the depleted Biocarta categories. Original distribution of pathway categories in Biocarta database is given on the left. Distribution of the Biocarta categories depleted in CNVs that were obtained with the size-dependent enrichment analysis is given on the right. Figure S2. Histogram of CNV frequency differences in three population pairs: CEU-YRI, CEU-ASN and ASN_YRI. CNV frequency is a frequency of polymorphism and calculated as described in the Methods. Frequency differences are given in absolute values. Figure S3. CNV-gene frequency heatmaps for 368 pathways. Heatmaps are constructed for CNV-gene pairs with 10 kb flanks. Figure S4. Rearrangements around SORD gene area in human and chimpanzee. Figure shows Mauve block alignment of four homologous regions in human and chimpanzee. First two regions are extracted from human reference genome (build hg18): chr15:43,080,000-43,163,000 and chr15:42,917,000-43,079,000, and the second two regions are taken from chimpanzee genomes (build panTro2): chr15:42,173,000-42,250,000 and chr15:41,950,000-42,030,000. The region of ~ 80 kb that includes the gene SORD underwent inverse duplication before the split of human and chimpanzee. The active copy of the gene SORD is encoded on the plus strand and is shown in the orange color. The copy of the gene SORD on the minus strand, that most likely became a pseudogene, is shown in the light orange. CNV resulted from a loss of a region in human genome from the inverted copy of the gene SORD (see empty box at the second alignment row). Analysis of the RepeatMasker annotation revealed that in the chimpanzee, the L1 element (L1PA3) is located right next to the CNV boundary. In the corresponding region in human, we see that the same L1PA3 element was truncated from 50% of length to 15%, and the Alu element (AluJb) was inserted just at the location of CNV. However, only 62% of Alu length remained in the sequence. The transposable elements activity can also be seen in the promoter area of the gene SORD. The remnants of retrovirus (HERV9, 50% of length) are present in the promoter region of three copies of SORD except the active human copy (first alignment row). Also, full length L1 element (L1PA6) that was most likely inserted in the retrovirus, is observed upstream of the active copy of the gene SORD, and the truncated copy of this element (65% of length) remained upstream of the SORD pseudogene copy. Figure S5. MAPK signaling pathway. A. CNV-gene frequency heatmap for MAPK signaling pathway. Rows correspond to CNV-gene pairs and columns correspond to three Hapmap populations: YRI, CEU and ASN. The values of the heatmap are CNV polymorphism frequency (see Methods). B. Schematic representation of a fragment of MAPK signaling pathway (adopted from KEGG). Highlighted in orange are the gene families, CACN and RASGRP, whose genes have CNVs with significant gene expression associations (P-value<=0.01) and evidence for population differentiation. Examples of gene variant associations for three genes from CACN family, CACNG2, CACNG6 and CACNG7, are given in separate boxes. Example of gene variant association for RASGRP family is given for RASGRP4 gene. Figure S6. Tuning Effect of Pathway Evolution. Different color and shape correspond to different enzymes. Increase in the concentration level of one enzyme (here green) can induce changes in the concentration levels of the linked enzymes (here blue and red). In the process of evolution, it can lead to the recruitment of enzymes that perform better functions, and as a result, create a new pathway.

Format: PDF Size: 5.3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Table S2. Complete list of 4978 CNVs with Fst calculated for each population pair.

Format: XLS Size: 1.6MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 5:

Table S3. List of pathways enriched for population differentiated CNV-gene pairs.

Format: XLS Size: 168KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 6:

Table S4. List of unique and shared pathways for Venn Diagram.

Format: XLS Size: 44KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Table S5. List of unique and shared genes for Venn Diagram of Figure 4B.

Format: XLS Size: 19KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Table S7. Enrichment analysis for the functional CNVs for CEU and YRI.

Format: XLS Size: 73KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 9:

Table S6. List of CNVs that showed significant association with gene expression levels annotated with population differentiation statistics, pathways information and annotation on overlap with enhancers.

Format: XLS Size: 140KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data