Designing and benchmarking the MULTICOM protein structure prediction system
1 Computer Science Department, University of Missouri, Columbia, MO, USA
2 Informatics Institute, University of Missouri, Columbia, MO, USA
3 C. Bond Life Science Center, University of Missouri, Columbia, MO, USA
BMC Structural Biology 2013, 13:2 doi:10.1186/1472-6807-13-2Published: 27 February 2013
Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor.
Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction.
Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/ webcite.