|
Sequence datasets used to generate training sets. |
||||
| Training dataset |
Sources |
Initial number sequences |
Sequences >120 AA |
Size after redundancy removal |
|
|
||||
| ntm |
PDB-REPRDB [32] |
3159 |
2290 |
1763 |
| ahtm |
Sanger all-alpha membrane datasets A, B and C [33] |
189 |
166 |
132 |
| bbtm |
TC-DB [35], Uniprot [34] and PDB [5] |
1126 |
1107 |
196 |
|
Three training datasets were generated using sequences from various sources. Datasets were filtered for sequences of <120 AA and clustered to remove redundancy. | ||||
Garrow et al. BMC Bioinformatics 2005 6:56 doi:10.1186/1471-2105-6-56 |
||||