Table 4

Accuracy estimates (100% – error rate) using different parameters for TFBS Identification, based on twenty repetitions, each utilizing ten-fold cross validation for a total of 200 runs

Promoter Range

1 kb upstream

1 kb upstream

5 kb upstream

5 kb upstream


PWM

All Proflies

Limited Profiles

All Proflies

Limited Profiles


Classifier

Expression

Lower

Upper

Feature Selection

Accuracy

SD

Accuracy

SD

Accuracy

SD

Accuracy

SD


IB1

Threshold

0.2

0.8

InfoGain

91.56%

235%

81.65%

4.22%

93.06%

1.80%

93.27%

2.35%

IB1

Threshold

0.33

0.66

InfoGain

91.89%

2.95%

90.72%

2.90%

95.57%

2.04%

93.62%

1.78%

IB1

Threshold

0.2

0.8

ChiSquared

89.96%

2.74%

81.00%

4.04%

93.92%

1.75%

92.63%

2.22%

IB1

Threshold

0.33

0.66

ChiSquared

91.10%

2.90%

90.67%

2.79%

94.07%

2.43%

93.43%

2.31%

IB1

Tanh

0.25

0.75

InfoGain

92.71%

2.43%

92.74%

2.30%

92.13%

2.47%

92.00%

3.01%

Naive Bayes

Threshold

0.2

0.8

InfoGain

90.47%

2.85%

8235%

3.78%

96.04%

1.34%

94.98%

1.41%

Naive Bayes

Threshold

0.2

0.8

InfoGain

91.67%

2.53%

83.18%

3.11%

94.39%

1.73%

93.78%

2.00%


Table 4 shows the effects of variations in the parameters for connectivity network construction. The genomic region searched for transcription factor binding sites was either 1000 bp or 5000 bp upstream of known genes. Two different collections of Position weighted matrices (PWM) were also applied: 1) all the matrices provided by TRANSFAC relevant to mammalian genes (All Profiles), or 2) the selection of PWMs identified by TRANSFAC as 'high quality' (Limited Profiles).

Tuck et al. BMC Bioinformatics 2006 7:236   doi:10.1186/1471-2105-7-236

Open Data