This article is part of the supplement: International Workshop on Computational Systems Biology: Approaches to Analysis of Genome Complexity and Regulatory Gene Networks

Open Access Open Badges Research

Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset

Fernanda L Sirota1*, Hong-Sain Ooi1, Tobias Gattermayer1, Georg Schneider1, Frank Eisenhaber123 and Sebastian Maurer-Stroh1

Author Affiliations

1 Biomolecular Function Discovery Division, Bioinformatics Institute (BII), Agency for Science Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore

2 Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, 117543, Singapore

3 School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, 637553, Singapore

For all author emails, please log on.

BMC Genomics 2010, 11(Suppl 1):S15  doi:10.1186/1471-2164-11-S1-S15

Published: 10 February 2010



Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Consequently, several methods with different underlying principles for disorder prediction have been independently developed by various groups. For assessing their usability in automated workflows, we are interested in identifying parameter settings and threshold selections, under which the performance of these predictors becomes directly comparable.


First, we derived a new benchmark set that accounts for different flavours of disorder complemented with a similar amount of order annotation derived for the same protein set. We show that, using the recommended default parameters, the programs tested are producing a wide range of predictions at different levels of specificity and sensitivity. We identify settings, in which the different predictors have the same false positive rate. We assess conditions when sets of predictors can be run together to derive consensus or complementary predictions. This is useful in the framework of proteome-wide applications where high specificity is required such as in our in-house sequence analysis pipeline and the ANNIE webserver.


This work identifies parameter settings and thresholds for a selection of disorder predictors to produce comparable results at a desired level of specificity over a newly derived benchmark dataset that accounts equally for ordered and disordered regions of different lengths.