Screening genome-wide DNA methylation CpG sites via training and testing data utilizing surrogate variables

Ray, Meredith; Tong, Xin; Zhang, Hongmei; Karmaus, Wilfred

doi:10.1186/1471-2105-15-S10-P4

Volume 15 Supplement 10

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Poster presentation
Open access
Published: 29 September 2014

Screening genome-wide DNA methylation CpG sites via training and testing data utilizing surrogate variables

Meredith Ray¹,
Xin Tong¹,
Hongmei Zhang² &
…
Wilfred Karmaus²

BMC Bioinformatics volume 15, Article number: P4 (2014) Cite this article

1197 Accesses
1 Citations
Metrics details

Background

Screening Cytosine-phosphate-Guanine dinucleotide (CpG) DNA methylation sites in association with single-nucleotide polymorphisms (SNPs), or covariate of interest, and/or their interactions is desired before performing more complicated analyses due to high dimensionality. It is possible the variation in methylation cannot be fully explained by SNPs and covariates of interest and thus it is important to account for variations introduced by other unknown factors. Furthermore, CpG sites screened from one data set may be inconsistent with those from another data set and it is equally important to improve the reproducibility of the selected CpG sites.

Materials and methods

A user-friendly R package, training-testing screening method (ttScreening), was developed to achieve these goals and provides users the flexibility of choosing different screening methods: proposed training and testing method, a method controlling false discovery rate (FDR), and a method controlling the significance level corrected by use of the Bonferroni method.

Results

Linear regressions were applied in the screening process, with methylation of a CpG site as the dependent variable, a single SNP, a covariate, and possibly their interactions as independent variables. Surrogate variable analyses were included to adjust for unknown factor effects. Randomly chosen training and testing samples were used to estimate and test the effects, respectively. Simulations based on different scenarios were implemented to test the robustness and sensitivity of the method and to compare with the other two screening methods. For almost all simulation scenarios, the training and testing screening method proved to outperform other methods in terms of correct identification of important CpG sites. For other occasions, ttScreening performed equally well. We applied ttScreening to 40,000 CpG sites based on their association with smoking and forced vital capacity. The ttScreening method selected 9 CpG sites and the other two methods selected 0 CpG sites.

Conclusions

Our simulation results indicate that ttScreening performs better than FDR-based screening and that it is at least as good as Bonferroni in terms of correctly identifying CpG sites that are associated with other variables. The package is computationally efficient and user-friendly, which indicates its suitability to high dimensional data for dimension reduction and its broad application in addition to epigenetic studies. The package can be downloaded at [1].

References

ttScreening package. [http://cran.r-project.org/web/packages/ttScreening/index.html]

Download references

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, 29208, USA
Meredith Ray & Xin Tong
Division of Epidemiology, Biostatistics, and Environmental Health, University of Memphis, Memphis, TN, 38152, USA
Hongmei Zhang & Wilfred Karmaus

Authors

Meredith Ray
View author publications
You can also search for this author in PubMed Google Scholar
Xin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wilfred Karmaus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongmei Zhang.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ray, M., Tong, X., Zhang, H. et al. Screening genome-wide DNA methylation CpG sites via training and testing data utilizing surrogate variables. BMC Bioinformatics 15 (Suppl 10), P4 (2014). https://doi.org/10.1186/1471-2105-15-S10-P4

Download citation

Published: 29 September 2014
DOI: https://doi.org/10.1186/1471-2105-15-S10-P4

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Screening genome-wide DNA methylation CpG sites via training and testing data utilizing surrogate variables

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

UT-KBRIN Bioinformatics Summit 2014: Abstracts

Screening genome-wide DNA methylation CpG sites via training and testing data utilizing surrogate variables

Background

Materials and methods

Results

Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us