A survey of protein interaction data and multigenic inherited disorders
1 Department for Molecular Biosciences, University of Oslo, P.O. Box 1041 Blindern, 0316, Oslo, Norway
2 The Biotechnology Centre of Oslo, University of Oslo, P.O. Box 1125 Blindern, 0317, Oslo, Norway
3 Scientific Computing Group, University of Oslo, P.O. Box 1059 Blindern, Oslo, Norway
BMC Bioinformatics 2013, 14:47 doi:10.1186/1471-2105-14-47Published: 11 February 2013
Multigenic diseases are often associated with protein complexes or interactions involved in the same pathway. We wanted to estimate to what extent this is true given a consolidated protein interaction data set. The study stresses data integration and data representation issues.
We constructed 497 multigenic disease groups from OMIM and tested for overlaps with interaction and pathway data. A total of 159 disease groups had significant overlaps with protein interaction data consolidated by iRefIndex. A further 68 disease overlaps were found only in the KEGG pathway database. No single database contained all significant overlaps thus stressing the importance of data integration. We also found that disease groups overlapped with all three interaction data types: n-ary, spoke-represented complexes and binary data – thus stressing the importance of considering each of these data types separately.
Almost half of our multigenic disease groups could potentially be explained by protein complexes and pathways. However, the fact that no database or data type was able to cover all disease groups suggests that no single database has systematically covered all disease groups for potential related complex and pathway data. This survey provides a basis for further curation efforts to confirm and search for overlaps between diseases and interaction data. The accompanying R script can be used to reproduce the work and track progress in this area as databases change. Disease group overlaps can be further explored using the iRefscape plugin for Cytoscape.