Table 1

Problem-specific Datasets.

Problem

Source

Type

#C

#Seq

#Res

#CV

%


Disorder Prediction

DisPro [7]

Binary

2

723

215612

10

30

Protein-DNA Site

DISIS [6]

Binary

2

693

127240

3

20

Residue-wise Contact

SVM [15]

Regression

680

120421

15

40

Local Structure

Profnet [35]

Multiclass

16

1600

286238

3

40


#C, #Seq, #Res, #CV, and % denote the number of classes, sequences, residues, number of cross validation folds, and the maximum pairwise sequence identity between the sequences, respectively. 8 represents the regression problem.

Rangwala et al. BMC Bioinformatics 2009 10:439   doi:10.1186/1471-2105-10-439

Open Data