Open Access Software

TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data

Aaron L Sarver1*, Jesse Erdman1, Tim Starr2, David A Largaespada3 and Kevin A T Silverstein1

Author Affiliations

1 Biostatistics and Bioinformatics Masonic Cancer Center, University of Minnesota, Minneapolis, USA

2 Obstetrics, Gynecology & Women's Health and Masonic Cancer Center, University of Minnesota, Minneapolis, USA

3 Department of Genetics, Cell Biology and Development and Pediatrics Masonic Cancer Center University of Minnesota, Minneapolis, USA

For all author emails, please log on.

BMC Bioinformatics 2012, 13:154  doi:10.1186/1471-2105-13-154

Published: 29 June 2012

Additional files

Additional file 1:

Additional supporting Tables. Table S1.xls. Suspect regions identified in Chip-SEQ data. CISs found to be highly significant in data sets composed from real mouse sequence obtained as a control for CHIP-SEQ randomly assigned to libraries. The trial was repeated with 3 different subsets of data A, B, C. Regions returned from all 3 tests are labeled “BADrepeat” and not returned as CIS drivers. Table S2. Comparison of window sizes and insert numbers calculated to be significant by Poisson distribution followed by Bonferroni correction with window sizes and inserts calculated by Monte Carlo Simulation for Colon cancer dataset and for liver cancer dataset. Table S3. CISs calculated by TAPDANCE method for colon cancer dataset. Table S4. CISs calculated by TAPDANCE method for liver cancer dataset. Table S5. CISs calculated by TAPDANCE method for combined datasets. Table S6. Association results for the combined datasets. Highly significant results are shown in Bold for the association between A)phenoCIS and B)coCIS. Table S7. Examples of 4 files required in the data directory in order to run the command line version of TAPDANCE. A) a file containing sequences labeled seqs.tab. B) A tab delimited file containing the barcodes, the library names ending in either –L or –R based on the direction of priming and the direction of priming (Left or Right). C) A tab delimited text file containing groups for CIS analyses, the default superset for association should be labeled “all” and subsets should be named with 6 or less meaningful characters. D) a Text file containing chromosomes that should not be analyzed due to the presence of the donor transposon concatamer and local hopping. Table S8. Report of the counts of the initial mapping and how many sequences have the described characteristics A) for the entire dataset, B) broken down by directional library, and C) total for each library following combination of left and right primed reads. Additionally in C the number of reads that were mappable, the number of reads that map and the total of regions that map at the defined threshold of the total mapped sequences is reported. For mapping to the genome we have observed 50-80% mapping to the genome of the mutagenized organism. Mapping percentages significantly lower would indicate potential problems. Table S9. CIS identified in the RTCGD retroviral insertion dataset. Table S10. Co-CIS identified in the RTCGD retroviral insertion dataset. Table S11. CIS identified in a pancreatic ductal adenocarcinoma SB screen generated by TAPDANCE methodology directly compared to the TOP 20 CIS generated by the modified Gaussian kernel convolution framework.

Format: XLS Size: 208KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 2:

Example data. We have included a zipped up archive.zip which contains 4 data files in the data directory, as well as the results obtained after running the scripts.

Format: ZIP Size: 1.7MB Download file

Open Data