This article is part of the supplement: International Workshop on Computational Systems Biology: Approaches to Analysis of Genome Complexity and Regulatory Gene Networks

Open Access Research

Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset

Fernanda L Sirota1*, Hong-Sain Ooi1, Tobias Gattermayer1, Georg Schneider1, Frank Eisenhaber123 and Sebastian Maurer-Stroh1

Author Affiliations

1 Biomolecular Function Discovery Division, Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore

2 Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, 117543, Singapore

3 School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, 637553, Singapore

For all author emails, please log on.

BMC Genomics 2010, 11(Suppl 1):S15  doi:10.1186/1471-2164-11-S1-S15

Published: 10 February 2010

Additional files

Additional file 1:

SL dataset. The SL dataset comprises DisProt r4.5 sequences re-annotated to consider short and long disordered residues, as well as ordered ones. The file is in fasta format, where the amino acid sequence is represented in single letter code and the one line header about the corresponding sequence starts with the symbol ">". The annotation of disordered and ordered regions follows the DisProt description, where the disordered regions are denoted by the symbol "#", while ordered ones are denoted by the symbol "&", followed by the starting and the end residues of the respective region (e.g. #1-10 &11-70 #71-100; where residues from 1 to 10 and 71 to 100 are disordered, while 11-70 are ordered).

Format: TXT Size: 269KB Download file

Open Data

Additional file 2:

Remark 465 dataset. The Remark 465 dataset comprises a set of sequences from DisProt r4.5 where at least one structural domain was found in the sequence. Residues annotated under Remark 465 in the PDB were here annotated as disordered. Consequently, the Remark 465 dataset comprises mainly short disordered regions. The file is in fasta format, where the amino acid sequence is represented in single letter code and the one line header about the corresponding sequence starts with the symbol ">". The annotation of disordered and ordered regions follows the DisProt description, where the disordered regions are denoted by the symbol "#", while ordered ones are denoted by the symbol "&", followed by the starting and the end residues of the respective region (e.g. #1-10 &11-70 #71-100; where residues from 1 to 10 and 71 to 100 are disordered, while 11-70 are ordered).

Format: TXT Size: 185KB Download file

Open Data

Additional file 3:

Supplementary Table and Figures 1 and 2.

Format: DOC Size: 330KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data