Open Access Research article

Analysis of CpG methylation sites and CGI among human papillomavirus DNA genomes

Silvia C Galván1*, Martha Martínez-Salazar1, Víctor M Galván2, Rocío Méndez3, Gibran T Díaz-Contreras4, Moisés Alvarado-Hermida4, Rogelio Alcántara-Silva4 and Alejandro García-Carrancá13

Author Affiliations

1 Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, México

2 Departamento El Hombre y su Ambiente, Universidad Autónoma Metropolitana-Xochimilco, México

3 Instituto Nacional de Cancerología, Secretaría de Salud, México

4 Facultad de Ingeniería, Universidad Nacional Autónoma de México, México

For all author emails, please log on.

BMC Genomics 2011, 12:580  doi:10.1186/1471-2164-12-580

Published: 25 November 2011



The Human Papillomavirus (HPV) genome is divided into early and late coding sequences, including 8 open reading frames (ORFs) and a regulatory region (LCR). Viral gene expression may be regulated through epigenetic mechanisms, including cytosine methylation at CpG dinucleotides. We have analyzed the distribution of CpG sites and CpG islands/clusters (CGI) among 92 different HPV genomes grouped in function of their preferential tropism: cutaneous or mucosal. We calculated the proportion of CpG sites (PCS) for each ORF and calculated the expected CpG values for each viral type.


CpGs are underrepresented in viral genomes. We found a positive correlation between CpG observed and expected values, with mucosal high-risk (HR) virus types showing the smallest O/E ratios. The ranges of the PCS were similar for most genomic regions except E4, where the majority of CpGs are found within islands/clusters. At least one CGI belongs to each E2/E4 region. We found positive correlations between PCS for each viral ORF when compared with the others, except for the LCR against four ORFs and E6 against three other ORFs. The distribution of CpG islands/clusters among HPV groups is heterogeneous and mucosal HR-HPV types exhibit both lower number and shorter island sizes compared to cutaneous and mucosal Low-risk (LR) HPVs (all of them significantly different).


There is a difference between viral and cellular CpG underrepresentation. There are significant correlations between complete genome PCS and a lack of correlations between several genomic region pairs, especially those involving LCR and E6. L2 and L1 ORF behavior is opposite to that of oncogenes E6 and E7. The first pair possesses relatively low numbers of CpG sites clustered in CGIs while the oncogenes possess a relatively high number of CpG sites not associated to CGIs. In all HPVs, E2/E4 is the only region with at least one CGI and shows a higher content of CpG sites in every HPV type with an identified E4. The mucosal HR-HPVs show either the shortest CGI size, followed by the mucosal LR-HPVs and lastly by the cutaneous viral subgroup, and a trend to the lowest CGI number, followed by the cutaneous viral subgroup and lastly by the mucosal LR-HPVs.