Table 5

Performance of GSP algorithm

(k1, b)

Geno 2007

Geno 2006

Geno 2005

Geno 2004

HARD 2004

document

passage

passage2

document

passage

document

document

document

passage


GSP

(0.4,2.0)

0.1066

0.0338

0.0149

0.1892

0.0242

0.1867

0.2723

0.2358

0.2639

(-1.87%)

(-98.75%)

(-58.28%)

(-7.09%)

(-25.95%)

(-4.96%)

(-7.74%)

(-3.72%)

(-0.15%)

(0.5,1.3)

0.149

0.0843

0.0456

0.2855

0.0466

0.2423

0.3165

0.2562

0.3001

(-6.18%)

(-86.59%)

(-36.85%)

(-8.17%)

(-26.31%)

(-6.88%)

(-7.01%)

(-8.57%)

(-0.54%)

(1.0,1.0)

0.1839

0.0898

0.0357

0.2757

0.0402

0.2385

0.3166

0.2501

0.2842

(-3.32%)

(-0.60%)

(-9.21%)

(-5.46%)

(-19.40%)

(-6.36%)

(-7.55%)

(-0.83%)

(-4.56%)

(1.2,0.75)

0.1905

0.0714

0.0658

0.3174

0.0404

0.2655

0.3293

0.2589

0.2776

(-5.35%)

(-10.11%)

(-13.79%)

(-6.11%)

(-11.65%)

(-7.62%)

(-8.11%)

(-1.07%)

(-0.65%)

(2.0,0.4)

0.1931

0.0657

0.0667

0.3203

0.0403

0.2588

0.3206

0.2567

0.2916

(-4.62%)

(-3.79%)

(-4.02%)

(-7.85%)

(-11.40%)

(-6.89%)

(-7.96%)

(-8.65%)

(-0.73%)

Best

0.1931

0.0898

0.0667

0.3203

0.0466

0.2655

0.3293

0.2589

0.3001


Baselines

Best

0.2108

0.0963

0.0641

0.3529

0.0718

0.2874

0.3584

0.281

0.2985


TA

Best

0.2724

0.1611

0.0762

0.3549

0.101

0.3085

0.3606

0.2845

0.3031


The GSP algorithm is adopted as a comparison to the proposed approach: (1) the candidates of 1 - sequences are all the keywords, the k - sequences candidates are generated on the frequent (k - 1) - sequences, after mapped the GSP algorithm to our research problem; (2) the counts of candidates are simulated as a non-parametric distribution, where the lower bound of the 95% confidence interval is the minimum support value for this GSP algorithm; (3) only the paragraph index under five parameter settings of (k1, b) is considered; (4) the best results of the GSP algorithm are compared with the best of the baselines and the proposed term association approach; (5) "TA" stands for term association; (6) the values in the parentheses are the relative rates of improvement over the original baselines.

Hu et al. BMC Bioinformatics 2012 13(Suppl 9):S2   doi:10.1186/1471-2105-13-S9-S2

Open Data