Table 1

Candidates with desired binding sites adjacent to alternative polyadenylation sites. Identification of cDNAs with potential alternative 3' ends and various patterns – base composition, degenerate, and combinatorial patterns – located between 100 nucleotides upstream of the 3' end of the first EST set through the 3' end of the second EST set. # and ** cDNAs were used as examples for the alignment shown in Figure 4.

Number of cDNAs with potential alternative 3' ends and search patterns


Search pattern

Number of Patterns

Number of cDNAs


A. Base composition

CstF64; U> = 4, G< = 4, A+C = 0 ;length = 8

163

276

SXL; U> = 15, G< = 2, A+C = 0 ;length = 17

154

5

SXL1; U> = 8, G+A+C = 0 ;length = 8

1

27

SXL2; U> = 10, G< = 2, A+C = 0 ;length = 12

79

25

B. Degenerate motifs

hnRNP F/H/H' (core); GGGA

1

232

hnRNP F/H/H'; GGGGA

1

78

Rbp1; DCADCUUA

9

47

PSI; RCYYCUURYRC

12

8

Rbp9; UUUNUUUU

4

111

C. Combinatorial motifs

CstF64 + SXL

25,102

5#

CstF64 + hnRNP F/H/H' (core)

163

178*

CstF64 + hnRNP F/H/H'

163

59**

SXL + hnRNP F/H/H' (core)

154

4***

PSI + hnRNP F/H/H' (core)

12

8***


# Since both SXL and CstF64 sites are GU rich, these motifs are not expected to be statistically independent. However, all three Monte Carlo analyses showed that the association was significant (P < 0.001) even when accounting for composition, indicating that SXL sites are more likely to also be CstF64 sites than chance predicts.

* and ** Associations are statistically significant by the G test:

(* G = 69.8, P = 3.3 × 10-17, df = 1; and **G = 11.6, P = 0.00033, df = 1). However, these associations were not significant in the Monte Carlo.

*** Associations not individually significant by the G test, but significant (<0.01) in all three Monte Carlo tests.

Associations of various other combinations of SXL, Rbp1, PSI, and Rbp9 motifs in cDNAs are not statistically significant.

Hamady et al. BMC Bioinformatics 2006 7:1   doi:10.1186/1471-2105-7-1

Open Data