The iASeq model. (a) An example of the data structure. Each row represents a SNP and each column corresponds to either the reference allele (R) or the non-reference allele (N) read counts from a ChIP-seq sample in a dataset. A dataset could be a TF ChIP-seq experiment or a HM ChIP-seq experiment, and can have multiple replicate samples (Rep). iASeq assumes the following data generating process. (b) First, SNPs belong to K + 1 classes with different ASB patterns. For each SNP, a class label aiis randomly assigned according to a class abundance probability vector Π. Given the class label, a configuration [bid,cid] is generated for each SNP in each dataset according to the probabilistic allele-specificity patterns specified by two vectors Vkand Wk. In the figure, the darkness of each cell in V and W represents the probability for bidor cidto be 1. (c) Next, a skewing probability pidjis generated for each SNP i, dataset d and replicate sample j based on [bid,cid]. The distribution of pidjfor NS SNPs in each sample follows a Beta distribution (blue lines). pidjs for SR SNPs are uniformly distributed in the interval [pdj0,1] where pdj0is the mean of the background Beta distribution (dark blue lines). pidjs for SN SNPs are uniformly distributed in the interval [0,pdj0] (light blue lines). (d) Finally, given the configuration [bid,cid], skewing probability pidjand a total read count nidjfor SNP i, dataset d and sample j, the read count for each allele is generated according to a binomial distribution. The length of the orange bar represents the non-reference allele read count, and the length of the red bar represents the reference allele read count.
Wei et al. BMC Genomics 2012 13:681 doi:10.1186/1471-2164-13-681