Greg Gibson and Thanawadee Preeprem on a new approach for interpreting personal genomes

Posted by Biome on 11th February 2014 - 0 Comments


Personal genomics – a process of determining the genome sequences of an individual and assessing the likely consequences of that person’s genetic variation – is  increasingly the source of much attention in both the public and the scientific community. This is due to the great potential it holds to predict diseases, identify mutations for family planning purposes, and guide medical treatment. Next generation sequencing technologies are also making medical treatment based on personal genomics a more affordable prospect. Still, understanding the impact of specific genetic variants (non-synonymous single nucleotide polymorphisms, or nsSNPs), and whether they are damaging or non-damaging, remains limited due to the complexity of personal genome interpretation and the need for a more holistic and integrative variant classification scheme. In a recent study in BioData Mining, Greg Gibson and Thanawadee ‘Bee’ Preeprem from the Georgia Institute of Technology, USA, discuss their integrative approach to assess genetic variation through an eight-level classification scheme. Here, Gibson and Preeprem explain how this approach could aid research and healthcare and their thoughts on how personal genomics has changed and will change.

 

What was the motivation behind this study, and how did your previous research lead up to it?

Efforts to predict whether a mutation is deleterious or not will be an increasingly important component of personal genomics. Currently, evolutionary conservation is the major criterion at the heart of most algorithms.  Bee Preeprem joined my lab after a couple of years working on structural biology, and we hatched the idea of trying to infer deleteriousness from first principles of protein structure, which she is doing.  Along the way she started assembling a database of functional data to go along with the structural data, and found that there is a need to make data mining of existing prediction algorithms and variant databases more practical. That is when the idea of adjusting the existing methods by functional evidence emerged.

 

Can  you briefly explain what a variant classification scheme is and why a more integrative scheme is needed?

It is well known that the half dozen schemes that rely mainly on DNA sequence data, notably Polyphen, SIFT, and MutationTaster, only agree for the most severe variants, leaving a large grey area for those that are less obvious. This means sequence conservation alone is not the major determinant of function and deleteriousness. So why not combine different types of evidence, such as existing databases that implicate the variant in disease or a phenotype? That’s what the Association-Adjusted Consensus Deleterious Scheme (AACDS) does; it groups variants into categories based both on consensus deleterious measures, and association data. It turns out that a couple of other groups had a similar idea: the developers of eXtasy and Phen-Gen actually combine disease phenotype data with the deleterious measures into a statistical ranking, instead of aiming for a broad classification as we do.

 

How does your ranking system compliment traditional sequence-based approaches?

Sequence-based measures typically look at allele frequencies, evolutionary conservation, and tolerance of each gene to mutations. But there is no reason why a functional variant has to affect a conserved site, and not all conserved sites mutate to functional alleles. Our AACDS scheme is designed to generate a best estimate of clinical significance of each variant of interest, using available resources. Since thousands of genes (4225 genes) and tens of thousands of individual variants (21557 amino acid variants) have now been annotated to disease, it seems obvious to try to combine the two types of data. A third type of data we are working on is protein structure data, and more and more, researchers are also considering high throughput experimental assays for function.

 

How do you see this classification scheme aiding future research?

The main thing will be to help prioritize variants for expensive follow-up assays.  Say you perform whole exome sequencing on 100 children with a rare disease, and come up with 150 candidate mutations. Statistical estimates suggest that maybe 50 of these are likely to be functional, but you don’t know which ones. The AACDS scheme can be used to prioritize the top 10 or 20, helping to focus resources for follow-up experiments and validation. Anyone interested in implementing it can access a search tool that Bee has developed, that is now live here.

 

How does this classification scheme fit into the context of personal genomics in healthcare? What impact do you think it could have?

Personalized genomics could go in any one of dozens of directions. Our score is not so much designed to identify the most likely causal variant in an individual genome, as to prioritize those that are most likely in a sample of individual sequences. As databases of whole exome data, and larger and larger meta-analyses of GWAS data, come online, we hope that the AACDS matures into a more accurate classifier.

 

How do you foresee the role of genomics in healthcare changing in the next 10 years?

How much time do you have?!  I’m teaching a new class on Predictive Health at Georgia Tech this semester, and there are as many different attitudes to this as there are students. Most are all over it, assuming that personal genomes are an inevitable part of individualized medical care and thinking how they can get involved. A few are legitimately more skeptical (budding scientists, no doubt). What is clear to us all is that it is not just about the genomics, there are fundamental issues of education, insurance, counselling, access to care, and attitudes that need to be addressed. Perhaps rare variants such as those targeted with the AACDS will be important in the first wave, but I think common variant genetic risk scores will also be important, and of course gene expression and metabolomic data. All linked to clinical measures as well. Exciting times.

 

What do you think are the most important questions that we should be asking in personal genomics, from a research perspective?

I wouldn’t mind seeing a switch in emphasis from predictive health to predictive therapeutics. It is one thing to understand why some people get diseases, but actually unlikely that genetic predictors of rare diseases will ever have high precision. Perhaps even less likely that people will do anything about their genetic risks either. But if we can work out why some people go into remission and others suffer constant relapses despite the same types of care (not just pharmacology, but surgery, hospice care, dietary intervention, counselling, etc.), we may make more of a difference for more people.

 

What are you up to next?

Bee is working on incorporating protein structure prediction directly into a new deleteriousness score, and has some encouraging new data.  My interests are in whether and how transcriptomics can be used in personalized medicine, and how to integrate genomic, clinical, transcriptome and metabolome data into personal profiles that define wellness in a very personal way. I’ve also started a blog on the genome, The Genomes Take.

 

More about the author(s)

Greg Gibson, Professor of Biology, Georgia Institute for Technology, USA.

Greg Gibson is the Director of the Center for Integrative Genomics at the Georgia Institute for Technology and uses genomics to understand human disease. For many years his lab studied the fruit fly Drosophila but now they apply their understanding of the genetics and genomics of that organism to humans. In particular, his group are interested in how genetics and the environment combine to alter susceptibility to disease. Gibson is also on the Editorial Board of Genome Medicine.

 

Thanawadee 'Bee' Preeprem, PhD student, Georgia Institute of Technology, USA.

 

 

Thanawadee ‘Bee’ Preeprem is undertaking her PhD in bioinformatics, with an emphasis on structural biology, under the guidance of Stephen Harvey at the Georgia Institute of Technology, USA. Preeprem also investigates protein structure as applied to the development of improved classification schemes for personal genomic interpretation, in the laboratory of Greg Gibson, also at the Georgia Institute of Technology, USA.