Table 1

Sequence datasets used to generate training sets.

Training dataset
Sources
Initial number sequences
Sequences >120 AA
Size after redundancy removal

ntm
PDB-REPRDB [32]
3159
2290
1763
ahtm
Sanger all-alpha membrane datasets A, B and C [33]
189
166
132
bbtm
TC-DB [35], Uniprot [34] and PDB [5]
1126
1107
196

Three training datasets were generated using sequences from various sources. Datasets were filtered for sequences of <120 AA and clustered to remove redundancy.

Garrow et al. BMC Bioinformatics 2005 6:56   doi:10.1186/1471-2105-6-56