Figure 2.

TAPDANCE database schema and processing flowchart. Overview of the TAPDANCE process. Input files are loaded into the database tables named with the project id. SQL and perl functions are used to identify library of origin, genomic sequence, remove duplicate sequences and to allow insert location identification using the bowtie mapping algorithm. This mapping process is iterative in the first iteration sequences > 33 bp are mapped allowing 3 mismatches. Anything that did not map in the first round was remapped following removal of the 3’UTR to leave only 33 bases in the second round. Similarly in the 3rd round remaining unmapped sequences of 30 bp were mapped allowing 2 mismatches. In the 4 th round previously unmapped sequences of length 28 bp were mapped with 1 mismatch. Finally previously unmapped sequences of length 24 bp are mapped with 0 mismatches. The mapped data is summarized and finally exported by the script using configurable data stored in the script including barcodes and insertion derived sequences. The scripts assembles sets of inserts, conducts CIS analyses, Co-CIS and Pheno-CIS analyses resulting in exportable files containing relevant information. All file locations are shown relative to root and additional intermediate tables are generated during processing as documented within the various scripts and dependencies. Persistent tables and results files are named using the $proj variable which is set in the file.

Sarver et al. BMC Bioinformatics 2012 13:154   doi:10.1186/1471-2105-13-154
Download authors' original image