BMC Bioinformatics Volume 8
|
Viewing options:Associated material:Related literature:- Articles citing this article
- Other articles by authors
- Related articles/pages
Tools:Post to:
|
Research articleIdentifying protein complexes directly from high-throughput TAP data with Markov random fieldsWasinee Rungsarityotin1,3 , Roland Krause1,2 , Arno Schödl3 and Alexander Schliep1  1Max Planck Institute for Molecular Genetics, Department of Computational Molecular Biology, Ihnestr. 73, D-14195 Berlin, Germany 2Max Planck Institute for Infection Biology, Department of Cellular Microbiology, Charitéplatz 1, D-10117 Berlin, Germany 3Think-cell software, Invalidenstr. 43, D-10115 Berlin, Germany author email corresponding author email
BMC Bioinformatics 2007,
8:482doi:10.1186/1471-2105-8-482
|
| Published: |
19 December 2007 |
Abstract
Background
Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.
Results
We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.
Conclusion
We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes. |