Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing
1 Laboratoire d’Informatique Fondamentale de Lille (LIFL, UMR CNRS 8022, Université Lille 1) and Inria Lille – Cité scientifique – Bâtiment M3, 59655 Villeneuve d’Ascq, France
2 Functional and Structural Genomic Platform, Université Lille 2, IFR 114, Lille, France
3 Department of Hematology, Biology and Pathology Center, Lille, France
4 Inserm U-837, Cancer Research Institute, Lille, France
5 Lille Institute for Cancer Research (IRCL), Lille, France
6 SIRIC OncoLille, Lille, France
BMC Genomics 2014, 15:409 doi:10.1186/1471-2164-15-409Published: 28 May 2014
V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood.
We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols.
The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.