Open Access Methodology article

Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing

Mathieu Giraud1*, Mikaël Salson1*, Marc Duez16, Céline Villenet2, Sabine Quief25, Aurélie Caillault3, Nathalie Grardel3, Christophe Roumier34, Claude Preudhomme34 and Martin Figeac2

Author Affiliations

1 Laboratoire d’Informatique Fondamentale de Lille (LIFL, UMR CNRS 8022, Université Lille 1) and Inria Lille – Cité scientifique – Bâtiment M3, 59655 Villeneuve d’Ascq, France

2 Functional and Structural Genomic Platform, Université Lille 2, IFR 114, Lille, France

3 Department of Hematology, Biology and Pathology Center, Lille, France

4 Inserm U-837, Cancer Research Institute, Lille, France

5 Lille Institute for Cancer Research (IRCL), Lille, France

6 SIRIC OncoLille, Lille, France

For all author emails, please log on.

BMC Genomics 2014, 15:409  doi:10.1186/1471-2164-15-409

Published: 28 May 2014



V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood.


We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols.


The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

Sequence analysis; High-throughput sequencing; V(D)J recombinations; Repertoire sequencing; Immunology; Leukemia; Minimal residual disease follow-up