This article is part of the supplement: SNP-SIG 2013: Identification and annotation of genetic variants in the context of structure, function, and disease

Open Access Open Badges Research

In silico comparative characterization of pharmacogenomic missense variants

Biao Li1, Chet Seligman1, Janita Thusberg1, Jackson L Miller1, Jim Auer1, Michelle Whirl-Carrillo2, Emidio Capriotti3, Teri E Klein2 and Sean D Mooney1*

Author Affiliations

1 The Buck Institute for Research on Aging, Novato, CA, USA

2 Department of Genetics, Stanford University, Stanford, CA, USA

3 Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA

For all author emails, please log on.

BMC Genomics 2014, 15(Suppl 4):S4  doi:10.1186/1471-2164-15-S4-S4

Published: 20 May 2014



Missense pharmacogenomic (PGx) variants refer to amino acid substitutions that potentially affect the pharmacokinetic (PK) or pharmacodynamic (PD) response to drug therapies. The PGx variants, as compared to disease-associated variants, have not been investigated as deeply. The ability to computationally predict future PGx variants is desirable; however, it is not clear what data sets should be used or what features are beneficial to this end. Hence we carried out a comparative characterization of PGx variants with annotated neutral and disease variants from UniProt, to test the predictive power of sequence conservation and structural information in discriminating these three groups.


126 PGx variants of high quality from PharmGKB were selected and two data sets were created: one set contained 416 variants with structural and sequence information, and, the other set contained 1,265 variants with sequence information only. In terms of sequence conservation, PGx variants are more conserved than neutral variants and much less conserved than disease variants. A weighted random forest was used to strike a more balanced classification for PGx variants. Generally structural features are helpful in discriminating PGx variant from the other two groups, but still classification of PGx from neutral polymorphisms is much less effective than between disease and neutral variants.


We found that PGx variants are much more similar to neutral variants than to disease variants in the feature space consisting of residue conservation, neighboring residue conservation, number of neighbors, and protein solvent accessibility. Such similarity poses great difficulty in the classification of PGx variants and polymorphisms.