Table 1 

Candidates with desired binding sites adjacent to alternative polyadenylation sites. Identification of cDNAs with potential alternative 3' ends and various patterns – base composition, degenerate, and combinatorial patterns – located between 100 nucleotides upstream of the 3' end of the first EST set through the 3' end of the second EST set. # and ** cDNAs were used as examples for the alignment shown in Figure 4. 

Number of cDNAs with potential alternative 3' ends and search patterns 



Search pattern 
Number of Patterns 
Number of cDNAs 


A. Base composition 

CstF64; U> = 4, G< = 4, A+C = 0 ;length = 8 
163 
276 
SXL; U> = 15, G< = 2, A+C = 0 ;length = 17 
154 
5 
SXL1; U> = 8, G+A+C = 0 ;length = 8 
1 
27 
SXL2; U> = 10, G< = 2, A+C = 0 ;length = 12 
79 
25 
B. Degenerate motifs 

hnRNP F/H/H' (core); GGGA 
1 
232 
hnRNP F/H/H'; GGGGA 
1 
78 
Rbp1; DCADCUUA 
9 
47 
PSI; RCYYCUURYRC 
12 
8 
Rbp9; UUUNUUUU 
4 
111 
C. Combinatorial motifs 

CstF64 + SXL 
25,102 
5^{#} 
CstF64 + hnRNP F/H/H' (core) 
163 
178* 
CstF64 + hnRNP F/H/H' 
163 
59** 
SXL + hnRNP F/H/H' (core) 
154 
4*** 
PSI + hnRNP F/H/H' (core) 
12 
8*** 


# Since both SXL and CstF64 sites are GU rich, these motifs are not expected to be statistically independent. However, all three Monte Carlo analyses showed that the association was significant (P < 0.001) even when accounting for composition, indicating that SXL sites are more likely to also be CstF64 sites than chance predicts. * and ** Associations are statistically significant by the G test: (* G = 69.8, P = 3.3 × 10^{17}, df = 1; and **G = 11.6, P = 0.00033, df = 1). However, these associations were not significant in the Monte Carlo. *** Associations not individually significant by the G test, but significant (<0.01) in all three Monte Carlo tests. Associations of various other combinations of SXL, Rbp1, PSI, and Rbp9 motifs in cDNAs are not statistically significant. 

Hamady et al. BMC Bioinformatics 2006 7:1 doi:10.1186/1471210571 