Bioinformatics workflow. 1) L-SAGE and Tag-seq experiments were processed separately 2) Experiments were divided in 2 groups, namely healthy and cancer 3) tags (thin black arrow) present in at least k% of healthy and at least k% of cancer experiments (k = 75% for L-SAGE and k = 90% for Tag-seq) were selected 4) The tags 5' boundaries were extended with the CATG (NlaIII) motif generating 4 + 17 = 21 base sequences and aligned on the human genome (long and thick black line) using blastn 5) tags matching the human genome exactly once were selected as RT (thick black arrow) 6) For each RT, sbsRT (thick black arrows carrying an ellipse) were searched among all the tags and were collected with their counts. 7) sbsRT matching a known SNP were excluded from SBS accounting (discontinuous rectangle) 8) For each RT, i.e. for each transcript, 2 proportions of sbsRT were calculated, i.e. 1 in healthy and 1 in cancer. Finally, both proportions were statistically tested for equality.
Bianchetti et al. BMC Cancer 2012 12:509 doi:10.1186/1471-2407-12-509