Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Bioinformatics

Open Access Research

Learning by aggregating experts and filtering novices: a solution to crowdsourcing problems in bioinformatics

Ping Zhang1, Weidan Cao2 and Zoran Obradovic3*

Author Affiliations

1 Healthcare Analytics Research, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA

2 School of Media and Communication, Temple University, Philadelphia, PA 19122, USA

3 Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA 19122, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 12):S5  doi:10.1186/1471-2105-14-S12-S5

Published: 24 September 2013

Abstract

Background

In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources.

Results

We proposed a probabilistic classification algorithm based on labels obtained by multiple noisy annotators. The new algorithm is capable of eliminating annotations provided by novice labellers and of providing a more accurate estimate of the ground truth by consensus labelling according to higher quality annotations. The approach is evaluated on text classification and prediction of protein disorder. Our study suggests that the higher levels of accuracy, effectiveness and performance can be achieved by the new method as compared to alternatives.

Conclusions

The proposed method is applicable for meta-learning from multiple existing classification models and noisy annotations obtained by humans. It is particularly beneficial when many annotations are obtained by novice labellers. In addition, the proposed method can provide further characterization of each annotator that can help in developing more accurate classifiers by identifying the most competent annotators for each data instance.