Pairwise Alignment Pipeline. Each analysis is represented by a blue box. The blue arrows show the flow of information from one analysis to the other, either using the dataflow rules (solid arrow) or by massive creation of new jobs as part of the analysis (dashed arrows). Red arrows represent control rules, i.e. analyses that cannot start until the previous one has finished. Black arrows show the creation of new analyses during the execution of the pipeline. Turquoise arrows show alternative paths taken when a particular job fails. The green arrows mark the initial jobs required to run the pipeline. (A) First part of the Pairwise alignment pipeline where we build the set of raw alignments. First, the ChunkAndGroupDna module creates one DNA Collection for each genome. CreatePairAlignerJobs and CreateFilterDuplicateJobs create the BlastZ, QueryFilterDuplicates and TargetFilterDuplicates for these DNA Collections. The BlastZ analysis runs all the BLAST  jobs. In order to avoid border effects due to the initial chunking process of long chromosomes, we allow partially overlapping chunks. The QueryFilterDuplicates and TargetFilterDuplicates analyses remove the duplicates and resolve the inconsistencies in the overlap between these chunks of sequences. UpdateMaxAlignmentLength analyses are needed to perform efficient ''region queries'' in a MySQL database. (B) Second part of the Pairwise alignment pipeline where raw alignments are chained and netted. The DumpLargeNibForChains module formats the input files for the axtChain program. The CreateAlignmentChainJobs process creates one AlignmentChains job per pair of genomic segments. The netting is performed using the same strategy: a single CreateAlignmentNetJobs job creates all the AlignmentNets jobs. Last, the PairwiseHealthCheck analysis runs a set of sanity tests on the resulting data. (C) Timeline of this pipeline when aligning the human and the pika genomes.
Severin et al. BMC Bioinformatics 2010 11:240 doi:10.1186/1471-2105-11-240