This article is part of the supplement: SNP-SIG 2013: Identification and annotation of genetic variants in the context of structure, function, and disease
In silico comparative characterization of pharmacogenomic missense variants
- Equal contributors
1 The Buck Institute for Research on Aging, Novato, CA, USA
2 Department of Genetics, Stanford University, Stanford, CA, USA
3 Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA
BMC Genomics 2014, 15(Suppl 4):S4 doi:10.1186/1471-2164-15-S4-S4Published: 20 May 2014
Missense pharmacogenomic (PGx) variants refer to amino acid substitutions that potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to drug therapies. The PGx variants, as compared to disease-associated variants, have not been investigated as deeply. The ability to computationally predict future PGx variants is desirable; however, it is not clear what data sets should be used or what features are beneficial to this end. Hence we carried out a comparative characterization of PGx variants with annotated neutral and disease variants from UniProt, to test the predictive power of sequence conservation and structural information in discriminating these three groups.
126 PGx variants of high quality from PharmGKB were selected and two data sets were created: one set contained 416 variants with structural and sequence information, and, the other set contained 1,265 variants with sequence information only. In terms of sequence conservation, PGx variants are more conserved than neutral variants and much less conserved than disease variants. A weighted random forest was used to strike a more balanced classification for PGx variants. Generally structural features are helpful in discriminating PGx variant from the other two groups, but still classification of PGx from neutral polymorphisms is much less effective than between disease and neutral variants.
We found that PGx variants are much more similar to neutral variants than to disease variants in the feature space consisting of residue conservation, neighboring residue conservation, number of neighbors, and protein solvent accessibility. Such similarity poses great difficulty in the classification of PGx variants and polymorphisms.