Protein complex detection in PPI networks based on data integration and supervised learning method

Yu, Feng Ying; Yang, Zhi Hao; Hu, Xiao Hua; Sun, Yuan Yuan; Lin, Hong Fei; Wang, Jian

doi:10.1186/1471-2105-16-S12-S3

Volume 16 Supplement 12

Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2014): Bioinformatics

Research
Open access
Published: 25 August 2015

Protein complex detection in PPI networks based on data integration and supervised learning method

Feng Ying Yu¹,
Zhi Hao Yang¹,
Xiao Hua Hu²,
Yuan Yuan Sun¹,
Hong Fei Lin¹ &
…
Jian Wang¹

BMC Bioinformatics volume 16, Article number: S3 (2015) Cite this article

1831 Accesses
8 Citations
1 Altmetric
Metrics details

Abstract

Background

Revealing protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, which makes it possible to predict protein complexes from protein-protein interaction (PPI) networks. However, the small amount of known physical interactions may limit protein complex detection.

Methods

The new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC), which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks.

Results

The experimental results of SLPC on two different categories yeast PPI networks demonstrate effectiveness of the approach: compared with the original PPI networks, the best average improvements of 4.76, 6.81 and 15.75 percentage units in the F-score, accuracy and maximum matching ratio (MMR) are achieved respectively; compared with the denoising PPI networks, the best average improvements of 3.91, 4.61 and 12.10 percentage units in the F-score, accuracy and MMR are achieved respectively; compared with ClusterONE, the start-of the-art complex detection method, on the denoising extended PPI networks, the average improvements of 26.02 and 22.40 percentage units in the F-score and MMR are achieved respectively.

Conclusions

The experimental results show that the performances of SLPC have a large improvement through integration of new receivable PPI data from biomedical literature into original PPI networks and denoising PPI networks. In addition, our protein complexes detection method can achieve better performance than ClusterONE.

Background

Protein-protein interactions (PPI) are fundamental to the biological processes within a cell. Beyond individual interactions, there is a lot more systematic information contained in protein interaction graphs. Complex formation is one of the typical patterns in this graph and many cellular functions are performed by these complexes containing multiple protein interaction partners. Many automatic approaches have been proposed to detect the protein complexes from PPI networks, such as CMC [1], COACH [2], MCODE [3], MCL [4], Cfinder [5], and ClusterONE [6]. However, most of these methods are based on unsupervised graph clustering methods and predict protein complexes only with pre-defined rules. Compared with them, supervised learning methods [7, 8] can utilize the known complexes information and may achieve better performances.

At present, large number of PPI databases have been created. Gavin [9], Krogan [10] and DIP [11] are popular PPI databases used by the protein complex detection methods. However, these databases are sparse since the fraction of known true physical interactions is limited [12]. For example, the average numbers of interactions per protein are 6.98, 7.86, and 9.13 in DIP, Krogan, and Gavin, respectively. Nevertheless, large amounts of PPIs could be found in the rapidly growing biomedical literature. Furthermore, since these PPI data are provided by biomedical experts, they are relatively accurate. Their Integration with the existing PPI datasets can be hopeful to eliminate the PPI networks' sparsity, and, therefore, improve the complex detection performance.

In this paper, we present a complex detection approach based on data integration and supervised learning. In this approach, the new PPI networks are constructed by integrating PPI datasets with the PPI data extracted by PPIExtractor [13] from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection (SLPC) method, which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks. The experimental results demonstrate that our approach outperform ClusterONE, the state-of-the-art method.

Methods

Extracting PPI data with PPIExtractor

In our work, we use PPIExtractor [13] to extract PPI interactions from biomedical literature and then integrate them into the PPI networks. PPIExtractor is a useful tool publicly available for extracting new PPI data from a large collection of biomedical literature. Experimental evaluations show that it can achieve state-of-the-art performance on a DIP subset with respect to comparable evaluations.

PPIExtractor contains four modules: (i) Named Entity Recognition (NER) module which aims to identify the protein names in the biomedical literature; (ii) Normalization module which determines the unique identifier of proteins identified in NER module; (iii) PPI extraction module which extracts the PPI information in the biomedical literature; (iv) PPI visualization module which displays the extracted PPI information in the form of a graph. Figure 1 shows the architecture of PPIExtractor.

127,217 PubMed abstracts were downloaded from PubMed website (http://www.ncbi.nlm.nih.gov/pubmed) with the query string "((Saccharomyces cerevisiae) OR yeast) AND protein" and PPIExtractor extracted a total of 126,165 protein interactions from these abstracts.

Since most of the protein names in the PPI databases are systematic names for nuclear-encoded ORFs begin with the letter 'Y' (for 'Yeast') while those in PubMed abstracts are not, we built a yeast protein alias name list with about 6,000 entries from the UniProt website(http://www.uniprot.org/uniprot/? query=yeast&sort= score). The list is used to convert the protein names in PubMed abstracts to systematic names for nuclear-encoded ORFs.

PPI datasets

DIP, Krogan, Gavin, three yeast PPI datasets, are used in our work. The details of these PPI datasets are shown in Table 1. For each dataset, original PPI and denoising PPI networks are built, respectively, to verify our method's effectiveness. Original PPI networks are original three yeast PPI datasets mentioned above. Denoising PPI networks are three filtered PPI datasets, in which low reliability interactions are removed with different denoising thresholds. As a matter of fact, protein interaction data produced by high-throughput experiments are often associated with high false positive and false negative rates. Therefore, a method based on both semantic and topological similarity of the two proteins is applied in our work to measure the reliability of the interaction. GO (The Gene Ontology Consortium [14]) annotation from SGD [15] is used in this measurement approach. In this method, a PPI's reliability is defined as formula (1):

Table 1 Properties of three yeast PPI datasets.

Full size table

r e l (m, n) = \sqrt{- | C (m, n) | \times log (\frac{min | T_{i} (m, n) |}{T_{max}}) + N E (m, n)}

(1)

Where |C(m, n)| denotes the number of terms in C(m, n), the set of the GO terms in which annotation proteins m and n are included. | T_i(m, n) | denotes the number of terms in T_i(m, n), the set of annotated proteins on GO term g_i in whose annotation m and n are included. T_max denotes the maximum size of annotated proteins on all GO terms. The GO term's specificity can be quantified by the proportion of the annotation size of a GO term (T_i(m,n)) to the total number of annotated proteins (T_max), i.e. a GO term is regarded to be more specific if it has less annotated proteins. NE(m, n) denotes the number of neighbors that m and n share. The formula (1) demonstrates that if the GO term proteins m and n share is more specific, or if they have more common neighbors or GO terms, the interaction between them is more reliable. The details of the denoising PPI networks are shown in Table 2.

Table 2 Properties of denoising PPI networks with different denoising thresholds.

Full size table

Integration of the extracted PPI data into the PPI networks

PPIExtractor assigns the extracted PPIs from the biomedical literature weights representing their reliability [13]. In our study, only PPIs with the weights equal to or higher than an integrating threshold are integrated into the original PPI dataset. In addition, both two proteins in a new PPI should already exist in the PPI dataset. The amounts of the PPI added into the original PPI networks with different integrating thresholds are shown in Table 3.

Table 3 The amounts of the PPIs added into the original PPI networks with different integrating thresholds.

Full size table

The weights of the PPIs added into the denoising PPI networks are higher than the integrating threshold -0.6. the reason is that our SLPC method have the best performance on the original PPI networks with the integrating threshold -0.6. What is more, the PPIs, when integrated into the denoising PPI networks, are also filtered with different denoising thresholds. The amounts of the PPIs added with different denoising thresholds are shown in Table 4.

Table 4 The amounts of the PPIs added into the denoising PPI networks with different denoising thresholds.

Full size table

Protein complexes detection with SLPC

In our work, a supervised learning protein complex detection (SLPC) method is employed to predict the protein complexes from PPI networks. Currently, most of protein complex detection methods are unsupervised ones, without utilizing the known complexes information. However, in the research field of protein complexes, numerous complexes have been provided, which can be used as the prior knowledge of the complex detection methods. In previous work, we presented a supervised learning protein complex detection (SLPC) method to predict protein complexes [8]. The SLPC method utilizes the features including Graph density [3], Degree statistics, Edge weight statistics, Clustering coefficient [16], and Topologic change [17]. Experimental evaluations show that SLPC can achieve better performances than other present protein complex detection methods. SLPC algorithm is showed in Table 5 and more details are provided in [8].

Table 5 Protein complex detection algorithm.

Full size table

Experiments and results

Gold standard protein complexes

We constructed the gold standard protein complexes by combining MIPS [18], Aloy [19], SGD [15] with TAP06 [9]. Proteins absent from the corresponding PPI networks are filtered out from the gold standard. In addition, only the protein complexes including at least two different proteins are retained as the research shows that most of the protein complexes include more than one protein [20]. The details of the gold standard protein complexes of original PPI networks and denoising PPI networks are shown in Tables 6 and 7, respectively.

Table 6 The details of the gold standard protein complexes of original PPI networks.

Full size table

Table 7 The details of the gold standard protein complexes of denoising PPI networks with different denoising thresholds.

Full size table

Evaluation metrics

In our study, F-score, Accuracy (Acc), maximum matching ratio (MMR) are used as the evaluation metrics. The neighborhood affinity score NA(A, B) defined as follows is used to evaluate the similarity of two protein complexes A and B:

N A (A, B) = \frac{| V A \cap V B |^{2}}{| V A | \times | V B |}

If the NA(A, B) is large than or equal to 0.25, complexes A and B are regarded to be matching.

F-score, a popular metric of evaluating complex detection method, is used as the first measure to evaluate the performance.

N_{c b} = | {b | b \in B, \exists p \in P, N A (p, b) \geq 0.25} |

(3)

N_{c p} = | {p | p \in P, \exists b \in B, N A (p, b) \geq 0.25} |

(4)

Pr e c i s i o n = \frac{N_{c p}}{| P |}, Re c a l l = \frac{N_{c b}}{| B |}

(5)

F - s c o r e = \frac{2 \times Pr e c i s i o n \times Re c a l l}{| Pr e c i s i o n + Re c a l l |}

(6)

Where P and B are the predicted and gold standard complex sets, respectively; Ncb is the number of the gold standard complexes matching at least one predicted complex and Ncp is the number of the predicted complexes matching at least one gold standard complex and F-score is calculated as the harmonic mean of precision and recall values.

The second measure we used is the geometric accuracy as introduced by Broh´ee et al. [21], which is the geometric mean of clustering-wise sensitivity (Sn) and clustering-wise positive predictive value (PPV). A high Sn value indicates that the protein complex prediction has a good coverage of the proteins in the gold standard complexes, and a high PPV value indicates that the predicted protein complexes are likely to be true protein complexes. Assuming the number of the gold standard complexes is n and the number of the predicted complexes is m. T_ij denotes the number of proteins that are found both in gold standard complex i and predicted complex j. The Sn, PPV, Acc are defined as follows:

S n = \frac{\sum_{i = 1}^{n} max_{j} {T_{i j}}}{\sum_{i = 1}^{n} N_{i}}

P P V = \frac{\sum_{j = 1}^{m} max_{i} {T_{i j}}}{\sum_{i = 1}^{m} T_{. j}}

T_{. j} = \sum_{i = 1}^{n} T_{i j}

A c c = \sqrt{S n \times P P V}

The third metric we used is the maximum matching ratio (MMR) [6], which is based on a maximal one-to-one mapping between gold standard complex and predicted complex.

M M R = \frac{\sum_{i = 1}^{n} max_{j} N A (i, j)}{n}

Where n denotes the number of the gold standard complexes; m the number of the predicted complexes; j as the member of the predicted complexes. MMR offers a natural, intuitive way to compare predicted complexes with a gold standard and it explicitly penalizes cases when a reference complex is split into two or more parts in the predicted set, as only one of its parts is allowed to match the correct reference [6].

The Acc measure explicitly penalizes predicted complexes that do not match any of the reference complexes. However, gold standard sets of protein complexes are often incomplete [22]. As a consequence, predicted complexes not matching any known reference complexes may still exhibit high functional similarity or be highly co-localized, and therefore they could still be prospective candidates for further in-depth analysis. In other words, a predicted complex that does not match a reference complex is not necessarily an undesired result, and optimizing for the geometric accuracy measure might prevent us from detecting novel complexes from a PPI dataset. Therefore, in the performance comparison, the F-score and MMR are used as the main metrics; the Acc is only used as an auxiliary one.

The performances of SLPC on original PPI networks

First we tested SLPC on three original PPI networks, i.e. DIP, Krogan and Gavin. The results of F-score, accuracy and MMR are shown in Tables 8, 9, and 10, respectively. It can be seen that the performances measured with these metrics keep improving on these networks with the integrating threshold decreasing from 0 to -0.6. With the threshold -0.6, SLPC achieves the highest average improvements on all three original PPI networks: 4.76, 6.81 and 15.75 percentage units in F-score, accuracy and MMR, respectively. This shows that the introduction of PPIs extracted from literature into the original PPI datasets can boost the performance. The reason is that, the higher integrating threshold means more reliable new PPI interactions are integrated into the original PPI networks, which relieves the sparse problem of PPI networks. As shown in Table 11, in most cases, the average size of complexes predicted from extended PPI networks is much closer to the one of the gold standard protein complexes than that from the original PPI networks, and, therefore, SLPC achieves better performance on extended PPI networks than on original PPI networks.

Table 8 The F-score performances of SLPC on original PPI networks with different integrating thresholds.

Full size table

Table 9 The Accuracy performances of SLPC on original PPI networks with different integrating thresholds.

Full size table

Table 10 The MMR performances of SLPC on original PPI networks with different integrating thresholds.

Full size table

Table 11 The details of predicted complexes of SLPC on original PPI networks with different integrating thresholds.

Full size table

However, Tables 8 and 10 show that, F-score and MMR values begin to decline after they reach the highest values. The reason is that the lower integrating threshold will introduce more unreliable PPI interactions and therefore, deteriorate the performance of SLPC algorithm.

The performances of SLPC on denoising PPI networks

Denoising PPI networks are the ones form which the low reliable PPIs are removed as discussed in the Section PPI datasets. And the denoising extended PPI networks are the ones into which the PPIs extracted from literature are integrated. More specifically, the new PPIs are also filtered out with different denoising thresholds like those PPIs in original PPI networks, and then integrated into the corresponding denoising PPI networks.

The performances of SLPC on denoising PPI networks are shown in Tables 12, 13 and 14. The performance of SLPC on the denoising extended PPI network is better than that on the corresponding denoising PPI network with any denoising threshold. With denoising threshold 0.9, SLPC achieves highest average improvement of 3.91, 4.61 and 12.10 percentage units in F-score, accuracy and MMR, respectively on denoising extended PPI networks over denoising PPI networks. This shows, once again, that the introduction the PPIs extracted from literature can boot the performance of complex detection methods.

Table 12 The F-score performances of SLPC on denoising PPI networks with different denoising thresholds.

Full size table

Table 13 The Accuracy performances of SLPC on denoising PPI networks with different denoising thresholds.

Full size table

Table 14 The MMR performances of SLPC on denoising PPI networks with different denoising thresholds.

Full size table

In addition, Tables 12, 13 and 14 also show that, since the higher denoising threshold means more PPIs are filtered from the original PPI networks, which may lead to the missing of some real PPIs, the performances of SLPC algorithm on the denoising PPI networks and denoising extended PPI networks begin to decline after they reach the highest values.

The performance of ClusterONE, the state-of-the-art complex detection method, is also tested (its parameters are set as those described in [6]). With the denoising threshold 0.9, it achieves average improvements of 0.31, 0.40 and 1.29 percentage units in F-score, accuracy and MMR, respectively on denoising extended PPI networks over denoising PPI networks. This indicates that the introduction the PPIs extracted from literature can also boot the performance of ClusterONE. In addition, experimental results show that SLPC achieves better performance than ClusterONE. With the denoising threshold 0.9, the average performance improvement of SLPC over ClusterONE is 26.02 and 22.40 percentage units in F-score and MMR, respectively.

Conclusions

Protein complexes, consisting of molecular aggregations of proteins assembled by multiple protein interactions, are of the fundamental units of macro-molecular organizations and play crucial roles in integrating individual gene products to perform useful cellular functions. Large amounts of PPI data generated by high-throughput experimental techniques can be used to predict protein complexes from PPI networks. At the same time, numerous accurate PPIs could be found in the rapidly growing biomedical literature since they are provided by biomedical experts. Their Integration with the existing PPI datasets can be hopeful to eliminate the PPI networks' sparsity, and, therefore, improve the complex detection performance.

In this paper, an approach of introducing PPIs from biomedical literature into existing PPI networks and applying supervised learning method in protein complex detection is presented. In the approach, the new PPI networks are constructed by integrating PPI datasets with the large and readily available PPI data from biomedical literature, and then the less reliable PPI between two proteins are filtered out based on semantic similarity and topological similarity of the two proteins. Finally, the supervised learning protein complex detection, SLPC, which can make full use of the information of available known complexes, is applied to detect protein complex on the new PPI networks.

The best average improvements of 4.76, 6.81 and 15.75 percentage units in F-score, accuracy and MMR are achieved respectively, on original extended PPI networks. In addition, the best average improvements of 3.91, 4.61 and 12.10 percentage units in F-score, accuracy and MMR are achieved, respectively, on denoising extended PPI networks. All these results show that, the introduction of PPIs extracted from literature into the original PPI datasets can boost the performance significantly. The reason is that the sparsity problem of PPI networks is remitted by integrating PPI data from biomedical literature. The results also show that our method outperforms ClusterONE, the state-of-the-art method. This is because our method makes full use of the information of available known complexes. To summarize, our complex detection method, based on supervised learning method and integrating PPI data from biomedical literature, can achieve the better performances than other complex detection methods.

References

Liu G, Wong L, Chua HN: Complex discovery from weighted PPI networks. Bioinformatics. 2009, 25: 1891-1897. 10.1093/bioinformatics/btp311.
Article CAS PubMed Google Scholar
Wu M, Li X, Kwoh CK, Ng SK: A core-attachment based method to detect protein complexes in PPI networks. BMC bioinformatics. 2009, 10: 169-10.1186/1471-2105-10-169.
Article PubMed Central PubMed Google Scholar
Bader GD, Hogue CW: An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics. 2003, 4: 2-10.1186/1471-2105-4-2.
Article PubMed Central PubMed Google Scholar
Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30: 1575-1584. 10.1093/nar/30.7.1575.
Article PubMed Central CAS PubMed Google Scholar
Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T: CFinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006, 22: 1021-1023. 10.1093/bioinformatics/btl039.
Article CAS PubMed Google Scholar
Nepusz T, Yu H, Paccanaro A: Detecting overlapping protein complexes in protein-protein interaction networks. Nat methods. 2012, 9: 471-472. 10.1038/nmeth.1938.
Article PubMed Central CAS PubMed Google Scholar
Qi YJ, Balem F, Faloutsos C, Klein-Seetharaman J, Bar-Joseph Z: Protein complex identification by supervised graph local clustering. Bioinformatics. 2008, 24: i250-i258. 10.1093/bioinformatics/btn164.
Article PubMed Central CAS PubMed Google Scholar
Yu F, Yang Z, Tang N, Lin H, Wang J: Predicting protein complex in protein interaction network-a supervised learning based method. BMC Syst.Biol. 2014, 8 (Suppl 3): S4-10.1186/1752-0509-8-S3-S4.
Article PubMed Central PubMed Google Scholar
Gavin AC, Aloy P, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636. 10.1038/nature04532.
Article CAS PubMed Google Scholar
Krogan NJ, Cagney G, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670.
Article CAS PubMed Google Scholar
Xenarios I, Salwinski L, et al: DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30: 303-305. 10.1093/nar/30.1.303.
Article PubMed Central CAS PubMed Google Scholar
Hart GT, Ramani AK, Marcotte EM: How complete are current yeast and human protein interaction networks?. Genome Biol. 2006, 7: 120-10.1186/gb-2006-7-11-120.
Article PubMed Central PubMed Google Scholar
Yang Z, Zhao Z, Li Y, Hu Y, Lin H: PPIExtractor: A Protein Interaction Extraction and Visualization System for Biomedical Literature. NanoBioscience, IEEE Transactions. 2013, 12 (3): 173-181.
Article Google Scholar
Ashburner M, Ball CA, et al: Gene Ontology: tool for the unification of biology. Nat genet. 2000, 25: 25-29. 10.1038/75556.
Article PubMed Central CAS PubMed Google Scholar
Dwight SS, Harris MA, et al: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 2002, 30: 69-72. 10.1093/nar/30.1.69.
Article PubMed Central CAS PubMed Google Scholar
Stelzl U, Worm U, et al: A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005, 122: 957-968. 10.1016/j.cell.2005.08.029.
Article CAS PubMed Google Scholar
Chen L, Shi X, et al: Identifying protein complexes using hybrid properties. J proteome res. 2009, 8: 5212-5218. 10.1021/pr900554a.
Article CAS PubMed Google Scholar
Mewes HW, Amid C, et al: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res. 2004, 32: D41-D44. 10.1093/nar/gkh092.
Article PubMed Central CAS PubMed Google Scholar
Aloy P, Böttcher B, et al: Structure-based assembly of protein complexes in yeast. Science. 2004, 303: 2026-2029. 10.1126/science.1092645.
Article CAS PubMed Google Scholar
Dudley AM, Janse DM, et al: A global view of pleiotropy and phenotypically derived gene function in yeast. Mol syst Biol. 2005, 1: E1-E11.
Article Google Scholar
Brohee S, van Helden J: Evaluation of clustering algorithms for protein-protein interaction networks. BMC bioinformatics. 2006, 7: 488-10.1186/1471-2105-7-488.
Article PubMed Central PubMed Google Scholar
Jansen R, Gerstein M: Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction. Curr Opin in microbiol. 2004, 7: 535-545. 10.1016/j.mib.2004.08.012.
Article CAS Google Scholar

Download references

Acknowledgements

This work is supported by grants from the Natural Science Foundation of China (grant no. 61070098, 61272373 and 61340020), Trans-Century Training Programme Foundation for the Talents by the Ministry of Education of China (grant no. NCET-13-0084) and the Fundamental Research Funds for the Central Universities (grant no. DUT13JB09 and DUT14YQ213). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Declarations

Publication of this article was funded by the following grants: the Natural Science Foundation of China (grant no. 61070098, 61272373 and 61340020), Trans-Century Training Programme Foundation for the Talents by the Ministry of Education of China (grant no. NCET-13-0084) and the Fundamental Research Funds for the Central Universities (grant no. DUT13JB09 and DUT14YQ213).

This article has been published as part of BMC Bioinformatics Volume 16 Supplement 12, 2015: Selected articles from the IEE International Conference on Bioinformatics and Biomedicine (BIBM 2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/16/S12.

Author information

Authors and Affiliations

College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Feng Ying Yu, Zhi Hao Yang, Yuan Yuan Sun, Hong Fei Lin & Jian Wang
College of Computing &Informatics, Drexel University, Philadelphia, PA, 19104, USA
Xiao Hua Hu

Authors

Feng Ying Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Hao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Hua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hong Fei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jian Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi Hao Yang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

ZHY and FYY conceived of the study, carried out its design and drafted the manuscript. FYY performed the experiments. FYY, XHH, HFL, and JW participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Yu, F.Y., Yang, Z.H., Hu, X.H. et al. Protein complex detection in PPI networks based on data integration and supervised learning method. BMC Bioinformatics 16 (Suppl 12), S3 (2015). https://doi.org/10.1186/1471-2105-16-S12-S3

Download citation

Published: 25 August 2015
DOI: https://doi.org/10.1186/1471-2105-16-S12-S3

Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2014): Bioinformatics

Protein complex detection in PPI networks based on data integration and supervised learning method

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Extracting PPI data with PPIExtractor

PPI datasets

Integration of the extracted PPI data into the PPI networks

Protein complexes detection with SLPC

Experiments and results

Gold standard protein complexes

Evaluation metrics

The performances of SLPC on original PPI networks

The performances of SLPC on denoising PPI networks

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Keywords

BMC Bioinformatics

Contact us

Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2014): Bioinformatics

Protein complex detection in PPI networks based on data integration and supervised learning method

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Extracting PPI data with PPIExtractor

PPI datasets

Integration of the extracted PPI data into the PPI networks

Protein complexes detection with SLPC

Experiments and results

Gold standard protein complexes

Evaluation metrics

The performances of SLPC on original PPI networks

The performances of SLPC on denoising PPI networks

Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us