Table 2

An example of a hierarchical alignment and assembly protocol specification

Alignment and Assembly

A preprocessing step: Extracting a sub-sequence of the genomic sequence. This step is not required, but may be useful for some preliminary tests and protocol validation. It restricts the size of the sequences and expedites the computation

Input: reads files output of Illumina sequencing pipeline (sequence.txt files)

Tool: LONI Sub-Sequence extractor

Server Location:/projects1/idinov/projects/scripts/extract_lines_from_Textfile.sh

Output: Shorter sequence.fastq file

Data conversion: File conversion of solexa fastq in sanger fastq format

Input: reads files output of Illumina sequencing pipeline (sequence.txt files)

Tool: MAQ (sol2sanger option): Mapping and Assembly with Quality

Server Location:/applications/maq

Output: sequence.fastq file

Binary conversion: Conversion of fastq in a binary fastq file (bfq)

Input: sequence.fastq file

Tool: MAQ (fastq2bfq option)

Server Location:/applications/maq

Output: sequence.bfq file

Reference conversion: Conversion of the reference genome (fasta format) in binary fasta

Input: reference.fasta file (to perform the alignment)

Tool: MAQ (fasta2bfa option)

Server Location:/applications/maq

Output: reference.bfa file

Sequence alignment: Alignment of data sequence to the reference genome

Using MAQ:

Input: sequence.bfq, reference.bfa

Tool: MAQ (map option)

Server Location:/applications/maq

Output: alignment.map file

Using Bowtie:

Input: reference.fai, sequence.bfq,

Tool: Bowtie (map option)

Server Location:/applications/bowtie

Output: alignment.sam file

Indexing: Indexing the reference genome

Input: reference.fa

Tool: samtools (faidx option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: reference.fai

Mapping conversion:

MAQ2SAM:

Input: alignment.map file

Tool: samtools (maq2sam-long option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.sam file

SAM to full BAM:

Input: alignment.sam, reference.fai file

Tool: samtools (view -bt option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.bam file

Removal of duplicated reads:

Input: alignment.bam file

Tool: samtools (rmdup)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment.rmdup.bam file

Sorting:

Input: alignment. rmdup.bam file

Tool: samtools (sort option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.bam file

MD tagging:

Input: alignment. rmdup.sorted.bam file and reference REF.fasta file

Tool: samtools (calmd option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.calmd.bam file

Indexing:

Input: alignment.rmdup.sorted.calmd.bam file

Tool: samtools (index option)

Server Location:/applications/samtools-0.1.7_x86_64-linux

Output: alignment. rmdup.sorted.calmd.bam.bai file


This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.

Dinov et al. BMC Bioinformatics 2011 12:304   doi:10.1186/1471-2105-12-304

Open Data