Table 1

Abundance and Starting Nucleotide Preference for homopolymer and dimer loci in D. pulex

Dimers


Motif Type

Count (obs)

End (exp)

p-value

Starting Pref.

Motif Type

Count (obs)

End (exp)

p-value

Starting Pref.


TA

48814

42186

0

T

GC

8444

7623

1E-39

G

AT

35558

42186

CG

6802

7623

GA

33951

31919

0

G/T

AC

33773

29999

0

A/T

AG

30185

31919

CA

26535

29999

TC

40029

31919

TG

35249

29999

CT

23511

31919

GT

24437

29999


Trimers


AAC+

2728

2821

1E-78

T/C

ACT

598

695

7E-15

G/T

ACA*

2431

2821

CTA+

734

695

CAA*

3380

2821

TAC+

791

695

GTT*

2339

2821

AGT

665

695

TGT*

2657

2821

TAG+

564

695

TTG*

3390

2821

GTA+

815

695

AAG

5734

4486

0

T/A

AGC+

1839

2823

1E-226

C/G

AGA*

3657

4486

GCA*

2363

2823

GAA*

4278

4486

CAG+

4131

2823

CTT

3393

4486

GCT

3115

2823

TCT

3692

4486

TGC+

2725

2823

TTC+

6161

4486

CTG

2767

2823

AAT*

4099

2937

5E-216

A

AGG+

971

1039

2E-18

G/T

ATA+

2260

2937

GGA*

1222

1039

TAA*

2406

2937

GAG+

988

1039

ATT*

3533

2937

CCT

905

1039

TAT*

2233

2937

TCC

1207

1039

TTA

3093

2937

CTC

940

1039

ACC+

855

1089

4E-40

C/T

ATC

1153

1404

8E-46

T

CAC+

1057

1089

TCA*

1703

1404

CCA*

1383

1089

CAT*

1342

1404

GGT

1034

1089

GAT*

1452

1404

GTG

921

1089

TGA

1661

1404

TGG*

1285

1089

ATG*

1111

1404

ACG

1229

1373

8E-11

G

CCG

703

742

1E-47

G

CGA

1406

1373

CGC+

519

742

GAC+

1561

1373

GGC

916

742

CGT

1325

1373

CGG

731

742

TCG

1267

1373

GCG+

585

742

GTC

1452

1373

GCC

995

742


This table shows nonrandom starting nucleotides in both dimers and trimers. Motif type indicates largest possible repeat identified for each staggered SSR set. P-value calculated using Pearson's chi-square test for random expectation based upon observed and expected frequencies.

Preferential starting base is determined by the highest frequency SSR for the motif grouping. * indicate highly used codons, + indicate rarely used codons.

Sung et al. BMC Genomics 2010 11:691   doi:10.1186/1471-2164-11-691

Open Data