Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

Bernard J Pope1, Tú Nguyen-Dumont2, Fabrice Odefrey2, Fleur Hammet2, Russell Bell3, Kayoko Tao3, Sean V Tavtigian3, David E Goldgar4, Andrew Lonie1, Melissa C Southey2 and Daniel J Park2*

Author Affiliations

1 Victorian Life Sciences Computation Initiative, The University of Melbourne, 187 Grattan Street Carlton, Melbourne, Victoria 3010, Australia

2 Genetic Epidemiology Laboratory, Department of Pathology, Medical Building, The University of Melbourne, Melbourne, Victoria 3010, Australia

3 Huntsman Cancer Institute and Department of Oncological Sciences, University of Utah School of Medicine, Salt Lake City 84112, USA

4 Department of Dermatology, University of Utah School of Medicine, Salt Lake City 8411, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:65  doi:10.1186/1471-2105-14-65

Published: 25 February 2013

Abstract

Background

Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals.

Results

FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools.

Conclusions

FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes.

Keywords:
Massively parallel sequencing; Rare genetic variants; Filtering; Annotation; FAVR