An Integrated Prediction Method for Identifying Protein-Protein Interactions

Page: [271 - 286] Pages: 16

  • * (Excluding Mailing and Handling)

Abstract

Background: Protein-Protein Interactions (PPIs) play a key role in various biological processes. Many methods have been developed to predict protein-protein interactions and protein interaction networks. However, many existing applications are limited, because of relying on a large number of homology proteins and interaction marks.

Methods: In this paper, we propose a novel integrated learning approach (RF-Ada-DF) with the sequence-based feature representation, for identifying protein-protein interactions. Our method firstly constructs a sequence-based feature vector to represent each pair of proteins, via Multivariate Mutual Information (MMI) and Normalized Moreau-Broto Autocorrelation (NMBAC). Then, we feed the 638- dimentional features into an integrated learning model for judging interaction pairs and non-interaction pairs. Furthermore, this integrated model embeds Random Forest in AdaBoost framework and turns weak classifiers into a single strong classifier. Meanwhile, we also employ double fault detection in order to suppress over-adaptation during the training process.

Results: To evaluate the performance of our method, we conduct several comprehensive tests for PPIs prediction. On the H. pylori dataset, our method achieves 88.16% accuracy and 87.68% sensitivity, the accuracy of our method is increased by 0.57%. On the S. cerevisiae dataset, our method achieves 95.77% accuracy and 93.36% sensitivity, the accuracy of our method is increased by 0.76%. On the Human dataset, our method achieves 98.16% accuracy and 96.80% sensitivity, the accuracy of our method is increased by 0.6%. Experiments show that our method achieves better results than other outstanding methods for sequence-based PPIs prediction. The datasets and codes are available at https://github.com/guofei-tju/RF-Ada-DF.git.

Keywords: Protein-protein interaction, multivariate mutual information, random forest, AdaBoost framework, double fault detection, sensitivity.

Graphical Abstract

[1]
Theofilatos, K.A.; Dimitrakopoulos, C.M.; Tsakalidis, A.K.; Likothanassis, S.D.; Papadimitriou, S.; Mavroudi, S. Computational approaches for the prediction of protein-protein interactions: a survey. Curr. Bioinform., 2011, 6, 398-414.
[2]
Wei, L.; Xing, P.; Zeng, J.; Chen, J.; Su, R.; Guo, F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med., 2017, 83, 67-74.
[3]
Li, J.; Bonkowski, M.S.; Moniot, S.; Zhang, D.; Hubbard, B.P.; Ling, A.J.; Rajman, L.A.; Qin, B.; Lou, Z.; Gorbunova, V. A conserved NAD(+) binding pocket that regulates protein-protein interactions during aging. Science, 2017, 355, 1312.
[4]
Zhang, Q.; Zhang, P.W.; Cai, Y.D. The use of protein-protein interactions for the analysis of the associations between PM2.5 and some diseases. BioMed Res. Int., 2016, 5(8), 1-7.
[5]
Wang, F.; Song, B.; Li, D.; Zhao, X.; Miao, Y.; Jiang, P.; Zhang, D. PPIPP: an online protein-protein interaction network prediction and analysis platform. Data Mining Bioinform, 2016, 14, 305-314.
[6]
Yu, C.; Chou, L.; Chang, D.T.H. Predicting protein-protein interactions in unbalanced data using the primary structure of proteins. BMC Bioinformatics, 2010, 11, 167.
[7]
David, D.; Florencio, FP.; Alfonso, V. Emerging methods in protein co-evolution. Nat. Rev. Genet., 2013, 14(4), 249-261.
[8]
Alfonso, V.; Florencio, F. Computational methods for the prediction of protein interaction. Curr. Opin. Struct. Biol., 2002, 12(3), 368-373.
[9]
Jang, H.; Lim, J.; Lim, J.H.; Park, S.J.; Lee, K.C.; Park, S.H. Finding the evidence for protein-protein interactions from PubMed abstracts. Bioinformatics, 2006, 22, e220-e226.
[10]
Daraselia, N.; Yuryev, A.; Egorov, S.; Novichkova, S.; Nikitin, A.; Mazo, I. Extracting human protein interactions from Medline using a full-sentence parser. Bioinformatics, 2004, 20(5), 604-611.
[11]
Guo, Y.; Yu, L.; Wen, Z.; Li, M. Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res., 2008, 36, 3025-3030.
[12]
Wong, L.; You, Z.H.; Li, S.; Huang, Y.A.; Liu, G. Detection of protein-protein interactions from amino acid sequences using a rotation forest model with a novel PR-LPQ descriptor. Int. Conf. Intell. Comput., 2015, pp. 713-720.
[13]
Shen, J.; Zhang, J.; Luo, X.; Zhu, W.; Yu, K.; Chen, K.; Li, Y.; Jiang, H. Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA, 2007, 104, 4337-4341.
[14]
Zhou, Y.; Gao, Y.; Zheng, Y. Prediction of protein-protein interactions using local description of amino acid sequence. Adv. Comput. Sci. Educat. Applicat, 2011, 202, 254-262.
[15]
Yang, L.; Xia, J.; Gui, J. Prediction of protein-protein interactions from protein sequence using local descriptors. Protein Pept. Lett., 2010, 17, 1085-1090.
[16]
Wang, J.; Zhang, L.; Jia, L.; Ren, Y.; Yu, G. Protein-protein interactions prediction using a novel local conjoint triad descriptor of amino acid sequences. Int. J. Mol. Sci., 2017, 18, 2373.
[17]
You, Z.H.; Chan, K.C.C.; Hu, P. Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. Plos One, 2015, 10 e0125811
[18]
You, Z.; Zhu, L.; Zheng, C.; Yu, H.; Deng, S.; Ji, Z. Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics, 2014, 15, 1-9.
[19]
Ding, Y.; Tang, J.; Guo, F. Identification of protein-protein interactions via a novel matrix-based sequence representation model with amino acid contact information. Int. J. Mol. Sci., 2016, 17, 1623.
[20]
Huang, Y.; You, Z.; Gao, X.; Wong, L.; Wang, L. Using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. BioMed. Res. Int., 2015, 2015, 902198-902198.
[21]
Wang, L.; You, Z.H.; Chen, X.; Li, J.Q.; Yan, X.; Zhang, W.; Huang, Y.A. An ensemble approach for large-scale identification of protein- protein interactions using the alignments of multiple sequences. Oncotarget, 2016, 8, 5149-5159.
[22]
Huang, Y.A.; You, Z.H.; Chen, X.; Chan, K.; Luo, X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics, 2016, 17, 184.
[23]
An, J.Y.; Meng, F.R.; You, Z.H.; Fang, Y.H.; Zhao, Y.J.; Ming, Z. Using the relevance vector machine model combined with local phase quantization to predict protein-protein interactions from protein sequences. BioMed Res. Int., 2016, 5(23), 1-9.
[24]
Wang, Y.B.; You, Z.H.; Li, X.; Jiang, T.H.; Chen, X.; Zhou, X.; Wang, L. Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. Mol. Biosyst., 2017, 13, 1336-1344.
[25]
You, Z.H.; Li, X.; Chan, K.C. An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers. Neurocomputing, 2016, 228.
[26]
Du, X.; Sun, S.; Hu, C.; Yao, Y.; Yan, Y.; Zhang, Y. DeepPPI: boosting prediction of protein-protein interactions with deep neural networks. J. Chem. Info. Model., 2017, 57, 1499.
[27]
Moschopoulos, C.N.; Pavlopoulos, G.A.; Likothanassis, S.D.; Kossida, S. Analyzing protein-protein interaction networks with web tools. Curr. Bioinform., 2011, 6, 389-397.
[28]
Fang, R.J.H.J. Mining protein-protein interaction data. Curr. Bioinform., 2006, 1, 197-205.
[29]
Zeng, J.; Li, D.; Wu, Y.; Zou, Q.; Liu, X. An empirical study of features fusion techniques for protein-protein interaction prediction. Curr. Bioinform., 2016, 11, 4-12.
[30]
Rätsch, G.; Onoda, T.; Müller, K.R. Soft margins for adaboost. Mach. Learn., 2001, 42, 287-320.
[31]
Nanni, L. Hyperplanes for predicting protein-protein interactions. Neurocomputing, 2005, 69, 257-263.
[32]
Nanni, L.; Lumini, A. An ensemble of K-local hyperplanes for predicting protein-protein interactions. Bioinformatics, 2006, 22, 1207-1210.
[33]
Cao, J.; Xiong, L. Protein sequence classification with improved extreme learning machine algorithms. Biomed. Res. Int., 2014, 2014(6), 660-373.
[34]
Caragea, C.; Silvescu, A.; Mitra, P. Protein sequence classification using feature hashing. Proteome Sci., 2012, 10(S14)(12), 538-545.
[35]
Ding, Y.; Tang, J.; Guo, F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics, 2016, 17, 398.
[36]
Cerf, N.; Adami, C. Information theory of quantum entanglement and measurement. Physica D. Nonlinear Phenomena, 1998, 120, 62-81.
[37]
Feng, Z.; Zhang, C. Prediction of membrane protein types based on the hydrophobic index of amino acids. J. Protein Chem., 2000, 19(4), 269-275.
[38]
You, Z.; Lei, Y.; Zhu, L.; Xia, J.; Wang, B. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics, 2013, 14(5), 69-75.
[39]
Leo, B. Random forests. Mach. Learn., 2001, 45(1), 5-32.
[40]
Giacinto, G.; Roli, F. Design of effective neural network nesembles for image classification purposes. Image Vis. Comput., 2001, 45(1), 5-32.
[41]
Martin, S.; Roe, D.C.; Faulon, J. Predicting protein-protein interactions using signature products. Bioinformatics, 2005, 21, 218-226.
[42]
Salwinski, L.; Miller, C.; Smith, A.; Pettit, F.; Bowie, J.E.A. The database of interacting proteins: 2004 update. Nucleic Acids Res., 2004, 32, 449-451.
[43]
Li, X.; Liao, B.; Shu, Y.; Zeng, Q.; Luo, J. Protein functional class prediction using global encoding of amino acid sequence. J. Theoret. Biol., 2009, 261, 290-293.
[44]
Bock, J.; Gough, D. Whole-proteome interaction mining. Bioinformatics, 2003, 19, 125-134.
[45]
Pitre, S.E.A. Global investigation of protein-protein interactions in yeast Saccharomyces cerevisiae using re-occurring short polypeptide sequences. Nucleic Acids Res., 2008, 36(13), 4286-4294.
[46]
Bellucci, M.; Agostini, F.; Masin, M.; Tartaglia, G. Predicting protein associations with long noncoding RNAs. Nat. Methods, 2011, 8(6), 444-445.
[47]
Park, Y.; Edward, M. A flaw in the typical evaluation scheme for pair-input computational predictions. Nat. Method, 2012, 9(12), 1134-1136.
[48]
Wei, L.; Xing, P.; Shi, G.; Ji, Z.L.; Zou, Q. Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 1, 1-1.
[49]
Wei, L.; Xing, P.; Su, R.; Shi, G.; Ma, Z.; Zou, Q. CPPred-RF: a sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J. Proteome Res., 2017, 16(5), 2044-2053.
[50]
Rashid, M.; Ramasamy, S.; Raghava, G.P. A simple approach for predicting protein-protein interactions. Curr. Protein Pept. Sci., 2010, 11(7), 589-600.
[51]
Bleicken, S.; Otsuki, M.; Garcia-Saez, A.J. Quantification of protein-protein interactions within membranes by fluorescence correlation spectroscopy. Curr. Protein Pept. Sci., 2011, 12(8), 691-698.
[52]
Guo, F.; Li, S.C.; Fan, Y.; Wang, L. Identifying protein-protein binding sites with a combined energy function. Curr. Protein Pept. Sci., 2014, 15(6), 540-552.
[53]
Huang, D.S.; Zhang, L.; Han, K.; Deng, S.; Yang, K.; Zhang, H. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci., 2014, 15(6), 553-560.
[54]
Rabbani, G.; Baig, M.H.; Ahmad, K.; Choi, I. Protein-protein interactions and their role in various diseases and their prediction techniques. Curr. Protein Pept. Sci., 2018, 19(10), 948-957.
[55]
Kotlyar, M.; Rossos, A.E.M.; Jurisica, I. Prediction of proteinprotein interactions. Curr. Protocols Bioinform., 2017, 60(1), 8.2.1-8.2.14.
[56]
Meng, F.; Uversky, V.N.; Kurgan, L. Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell. Mol. Life Sci., 2017, 74(17), 3069-3090.
[57]
Yijie, D.; Jijun, T.; Fei, G. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325, 211-224.
[http://dx.doi.org/10.1016/ j.neucom.2018.10.028]
[58]
Jian, S.; Jijun, T.; Fei, G. Identification of inhibitors of MMPS enzymes via a novel computational approach. Int. J. Biol. Sci., 2018, 14(8), 863-871.
[59]
Yijie, D.; Jijun, T.; Fei, G. Identification of drug-target interactions via multiple information integration. Info. Sci., 2017, 418, 546-560.
[60]
Yijie, D.; Jijun, T.; Fei, G. Identification of protein-ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model., 2017, 57(12), 3149-3161.
[61]
Zhao, Li.; Jijun, T.; Fei, G. Learning from real imbalanced data of 14-3-3 proteins binding specificity. Neurocomputing, 2016, 1-9.
[62]
Zhao, L.; Yilei, Z.; Gaofeng, P.; Jijun, T.; Fei, G. A novel peptide binding prediction approach for HLA-DR molecule based on sequence and structural information. BioMed Res. Int., 2016.
[63]
Yijie, D.; Jijun, T.; Fei, G. Identification of protein-protein interactions via a novel matrix-based sequence representation model with amino acid contact. Int. J. Mol. Sci., 2016, 17(10), 1623.
[64]
Yijie, D.; Jijun, T.; Fei, G. Identification of residue-residue contacts using a novel coevolution-based method. Curr. Proteomics, 2016, 13(2), 122-129.
[65]
Fei, G.; Yijie, D.; Shuai, C.L.; Chao, S.; Lusheng, W. Protein-protein interface prediction based on hexagon structure similarity. Comput. Biol. Chem., 2016, 63, 83-88.
[66]
Fei, G.; Yijie, D.; Zhao, L.; Jijun, T. Identification of protein-protein interactions by detecting correlated mutation at the interface. J. Chem. Inf. Model., 2015, 55(9), 2042-2049.
[67]
Fei, G.; Shuai, CL.; Zhexue, W.; Daming, Z.; Chao, S.; Lusheng, W. Structural neighboring property for identifying protein-protein binding sites. BMC Sys. Biol., 2015, 9(5), Article number: S3.
[68]
Fei, G.; Shuai, C.L.; Pufeng, D.; Lusheng, W. Probabilistic models for capturing more physicochemical properties on protein-protein interface. J. Chem. Inf. Model., 2014, 54(6), 1798-1809.
[69]
Fei, G.; Shuai, C.L.; Wenji, M.; Lusheng, W. Detecting protein conformational changes in interactions via scaling known structures. J. Comput. Biol., 2013, 20(10), 765-779.
[70]
Fei, G.; Shuai, C.L.; Lusheng, W.; Daming, Z. Protein-protein binding site identification by enumerating the configurations. BMC Bioinformatics, 2012, 13, 158.
[71]
Fei, G.; Lusheng, W. Computing the protein binding sites. BMC Bioinform., 2012, 13(Supp10), Article number: S2.
[72]
Fei, G.; Shuai, C.L.; Lusheng, W. Protein-protein binding sites prediction by 3D structural similarities. J. Chem. Inf. Model., 2011, 51(12), 3287-3294.
[73]
Gray, J.J.; Moughon, S.; Wang, C.; Schueler-Furman, O.; Kuhlman, B.; Rohl, C.A.; Baker, D. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol., 2003, 331(1), 281-299.