Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information

Page: [302 - 310] Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Background: The DNA-binding proteins is an important process in multiple biomolecular functions. However, the tradition experimental methods for DNA-binding proteins identification are still time consuming and extremely expensive.

Objective: In past several years, various computational methods have been developed to detect DNAbinding proteins. However, most of them do not integrate multiple information.

Methods: In this study, we propose a novel computational method to predict DNA-binding proteins by two steps Multiple Kernel Support Vector Machine (MK-SVM) and sequence information. Firstly, we extract several feature and construct multiple kernels. Then, multiple kernels are linear combined by Multiple Kernel Learning (MKL). At last, a final SVM model, constructed by combined kernel, is built to predict DNA-binding proteins.

Results: The proposed method is tested on two benchmark data sets. Compared with other existing method, our approach is comparable, even better than other methods on some data sets.

Conclusion: We can conclude that MK-SVM is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Keywords: DNA-binding proteins, feature extraction, support vector machine, multiple kernel learning, kernel alignment, binding sites.

Graphical Abstract

[1]
Shen, C.; Ding, Y.; Tang, J.; Xu, X.; Guo, F. An ameliorated prediction of drug-target interactions based on multi-scale discrete wavelet transform and network features. Int. J. Mol. Sci., 2017, 18(8), 1781.
[http://dx.doi.org/10.3390/ijms18081781] [PMID: 28813000]
[2]
Ding, Y.; Tang, J.; Guo, F. Identification of drug-target interactions via multiple information integration. Inf. Sci., 2017, 418, 546-560.
[http://dx.doi.org/10.1016/j.ins.2017.08.045]
[3]
Zhang, W.; Chen, Y.; Li, D. Drug-Target interaction prediction through label propagation with linear neighborhood information. Molecules, 2017, 22(12), 2056.
[http://dx.doi.org/10.3390/molecules22122056] [PMID: 29186828]
[4]
Ezzat, A.; Zhao, P.; Wu, M.; Li, X.; Kwoh, C.K. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 2016, 646-656.
[5]
Ding, Y.; Tang, J.; Guo, F. identification of protein-protein interactions via a novel matrix-based sequence representation model with amino acid contact information. Int. J. Mol. Sci., 2016, 17(10), 1623.
[http://dx.doi.org/10.3390/ijms17101623] [PMID: 27669239]
[6]
Ding, Y.; Tang, J.; Guo, F. Predicting protein-protein interactions via multivariate mutual information of protein sequences. BMC Bioinformatics, 2016, 17(1), 398.
[http://dx.doi.org/10.1186/s12859-016-1253-9] [PMID: 27677692]
[7]
Li, Z.; Zhao, Y.; Pan, G.; Tang, J.; Guo, F. a novel peptide binding prediction approach for hla-dr molecule based on sequence and structural information. BioMed Res. Int., 2016, 2016 3832176
[http://dx.doi.org/10.1155/2016/3832176] [PMID: 27340658]
[8]
Huang, Y.A.; You, Z.H.; Chen, X.; Chan, K.; Luo, X. Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding. BMC Bioinformatics, 2016, 17(1), 184.
[http://dx.doi.org/10.1186/s12859-016-1035-4] [PMID: 27112932]
[9]
Bock, J.R.; Gough, D.A. Whole-proteome interaction mining. Bioinformatics, 2003, 19(1), 125-134.
[http://dx.doi.org/10.1093/bioinformatics/19.1.125] [PMID: 12499302]
[10]
Ding, Y.; Tang, J.; Guo, F. Identification of residue-residue contacts using a novel coevolution- based method. Curr. Proteomics, 2016, 13, 122-129.
[http://dx.doi.org/10.2174/157016461302160514004105]
[11]
Guo, F.; Li, S.C.; Wei, Z.; Zhu, D.; Shen, C.; Wang, L. Structural neighboring property for identifying protein-protein binding sites. BMC Syst. Biol., 2015, 9(Suppl. 5), S3.
[http://dx.doi.org/10.1186/1752-0509-9-S5-S3] [PMID: 26356630]
[12]
Ding, Y.; Tang, J.; Guo, F. Identification of Protein-ligand binding sites by sequence information and ensemble classifier. J. Chem. Inf. Model., 2017, 57(12), 3149-3161.
[http://dx.doi.org/10.1021/acs.jcim.7b00307] [PMID: 29125297]
[13]
Yu, D.J.; Hu, J.; Yang, J.; Shen, H.B.; Tang, J.; Yang, J.Y. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2013, 10(4), 994-1008.
[http://dx.doi.org/10.1109/TCBB.2013.104] [PMID: 24334392]
[14]
Ofran, Y.; Mysore, V.; Rost, B. Prediction of DNA-binding residues from sequence. Bioinformatics, 2007, 23(13), i347-i353.
[http://dx.doi.org/10.1093/bioinformatics/btm174] [PMID: 17646316]
[15]
Roy, A.; Yang, J.; Zhang, Y. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Res., 2012, 40(Web Server issue), W471-7.
[http://dx.doi.org/10.1093/nar/gks372] [PMID: 22570420]
[16]
Zhang, W.; Qu, Q.; Zhang, Y.; W, W. The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions. Neurocomputing, 2018, 273, 526-534.
[http://dx.doi.org/10.1016/j.neucom.2017.07.065]
[17]
Ding, Y.; Tang, J.; Guo, F. Identification of Drug-side Effect Association via Semi-supervised Model and Multiple Kernel Learning. IEEE J. Biomed. Health Inform., 2018.
[http://dx.doi.org/10.1109/JBHI.2018.2883834] [PMID: 30507518]
[18]
Zhang, W.; Zou, H.; Luo, L.; Liu, Q.; Wu, W.; Xiao, W. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing, 2016, 173, 979-987.
[http://dx.doi.org/10.1016/j.neucom.2015.08.054]
[19]
Zhang, W.; Yue, X.; Huang, F.; Liu, R.; Chen, Y.; Ruan, C. Predicting drug-disease associations and their therapeutic function based on the drug-disease association bipartite network. Methods, 2018, 145, 51-59.
[http://dx.doi.org/10.1016/j.ymeth.2018.06.001] [PMID: 29879508]
[20]
Ding, Y.; Tang, J.; Guo, F. Identification of drug-side effect association via multiple information integration with centered kernel alignment. Neurocomputing, 2019, 325, 211-224.
[http://dx.doi.org/10.1016/j.neucom.2018.10.028]
[21]
Wang, Y.; Ding, Y.; Guo, F.; Wei, L.; Tang, J. Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One, 2017, 12(9) e0185587
[http://dx.doi.org/10.1371/journal.pone.0185587] [PMID: 28961273]
[22]
Wei, L.; Tang, J.; Quan, Z. Local-DPP: An improved DNA-binding protein prediction method by exploring local evolutionary information. Inf. Sci., 2016, 384, 135-144.
[http://dx.doi.org/10.1016/j.ins.2016.06.026]
[23]
Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K-C. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One, 2014, 9(9) e106691
[http://dx.doi.org/10.1371/journal.pone.0106691] [PMID: 25184541]
[24]
Shen, C.; Ding, Y.; Tang, J.; Guo, F. Multivariate information fusion with fast kernel learning to kernel ridge regression in predicting LncRNA-protein interactions. Front. Genet., 2019, 9, 716.
[http://dx.doi.org/10.3389/fgene.2018.00716] [PMID: 30697228]
[25]
Shen, C.; Ding, Y.; Tang, J.; Jiang, L.; Guo, F. LPI-KTASLP: prediction of lncRNA-protein interaction by semi-supervised link learning with multivariate information. IEEE Access, 2019, 1-1.
[http://dx.doi.org/10.1109/ACCESS.2019.2894225]
[26]
Zhao, Q.; Zhang, Y.; Hu, H.; Ren, G.; Zhang, W.; Liu, H. IRWNRLPI: Integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction Prediction. Front. Genet., 2018, 9, 239.
[http://dx.doi.org/10.3389/fgene.2018.00239] [PMID: 30023002]
[27]
Jiang, L.; Xiao, Y.; Ding, Y.; Tang, J.; Guo, F. FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association. BMC Genomics, 2018, 19(Suppl. 10), 911.
[http://dx.doi.org/10.1186/s12864-018-5273-x] [PMID: 30598109]
[28]
Chen, X.; Qu, J.; Yin, J. TLHNMDA: Triple layer heterogeneous network based inference for MiRNA-disease association prediction. Front. Genet., 2018, 9, 234.
[http://dx.doi.org/10.3389/fgene.2018.00234] [PMID: 30018632]
[29]
Chen, X.; Yin, J.; Qu, J.; Huang, L. MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction. PLOS Comput. Biol., 2018, 14(8) e1006418
[http://dx.doi.org/10.1371/journal.pcbi.1006418] [PMID: 30142158]
[30]
Jiang, L.; Ding, Y.; Tang, J.; Guo, F. MDA-SKF: similarity kernel fusion for accurately discovering miRNA-disease association. Front. Genet., 2018, 9, 618.
[http://dx.doi.org/10.3389/fgene.2018.00618] [PMID: 30619454]
[31]
Shen, Y.; Tang, J.; Guo, F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J. Theor. Biol., 2019, 462, 230-239.
[http://dx.doi.org/10.1016/j.jtbi.2018.11.012] [PMID: 30452958]
[32]
Jiang, L.; Xiao, Y.; Ding, Y.; Tang, J.; Guo, F. Discovering cancer subtypes via an accurate fusion strategy on multiple profile data. Front. Genet., 2019, 10, 20.
[http://dx.doi.org/10.3389/fgene.2019.00020] [PMID: 30804977]
[33]
Nimrod, G.; Schushan, M.; Szilágyi, A.; Leslie, C.; Ben-Tal, N. iDBPs: a web server for the identification of DNA binding proteins. Bioinformatics, 2010, 26(5), 692-693.
[http://dx.doi.org/10.1093/bioinformatics/btq019] [PMID: 20089514]
[34]
Bhardwaj, N.; Langlois, R.E.; Zhao, G.; Lu, H. Kernel-based machine learning protocol for predicting DNA-binding proteins. Nucleic Acids Res., 2005, 33(20), 6486-6493.
[http://dx.doi.org/10.1093/nar/gki949] [PMID: 16284202]
[35]
Ahmad, S.; Sarai, A. Moment-based prediction of DNA-binding proteins. J. Mol. Biol., 2004, 341(1), 65-71.
[http://dx.doi.org/10.1016/j.jmb.2004.05.058] [PMID: 15312763]
[36]
Cai, Y.D.; Lin, S.L. Support vector machines for predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochim. Biophys. Acta, 2003, 1648(1-2), 127-133.
[http://dx.doi.org/10.1016/S1570-9639(03)00112-2] [PMID: 12758155]
[37]
Yu, X.; Cao, J.; Cai, Y.; Shi, T.; Li, Y. Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines. J. Theor. Biol., 2006, 240(2), 175-184.
[http://dx.doi.org/10.1016/j.jtbi.2005.09.018] [PMID: 16274699]
[38]
Liu, B.; Xu, J.; Fan, S.; Xu, R.; Zhou, J.; Wang, X. PseDNA-Pro: DNA-Binding protein identification by combining Chou’s PseAAC and physicochemical distance transformation. Mol. Inform., 2015, 34(1), 8-17.
[http://dx.doi.org/10.1002/minf.201400025] [PMID: 27490858]
[39]
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
[http://dx.doi.org/10.1093/nar/25.17.3389] [PMID: 9254694]
[40]
Kumar, M.; Gromiha, M.M.; Raghava, G.P. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics, 2007, 8, 463.
[http://dx.doi.org/10.1186/1471-2105-8-463] [PMID: 18042272]
[41]
Li, X.; Liao, B.; Shu, Y.; Zeng, Q.; Luo, J. Protein functional class prediction using global encoding of amino acid sequence. J. Theor. Biol., 2009, 261(2), 290-293.
[http://dx.doi.org/10.1016/j.jtbi.2009.07.017] [PMID: 19631664]
[42]
You, Z.H.; Zhu, L.; Zheng, C.H.; Yu, H.J.; Deng, S.P.; Ji, Z. Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set. BMC Bioinformatics, 2014, 15(Suppl. 15), S9.
[http://dx.doi.org/10.1186/1471-2105-15-S15-S9] [PMID: 25474679]
[43]
Feng, Z.P.; Zhang, C.T. Prediction of membrane protein types based on the hydrophobic index of amino acids. J. Protein Chem., 2000, 19(4), 269-275.
[http://dx.doi.org/10.1023/A:1007091128394] [PMID: 11043931]
[44]
Jeong, J.C.; Lin, X.; Chen, X.W. On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2011, 8(2), 308-315.
[http://dx.doi.org/10.1109/TCBB.2010.93] [PMID: 20855926]
[45]
Huang, Y.A.; You, Z.H.; Gao, X.; Wong, L.; Wang, L. using weighted sparse representation model combined with discrete cosine transformation to predict protein-protein interactions from protein sequence. BioMed Res. Int., 2015, 2015 902198
[http://dx.doi.org/10.1155/2015/902198] [PMID: 26634213]
[46]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn., 1995, 20, 273-297.
[http://dx.doi.org/10.1007/BF00994018]
[47]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines; ACM, 2011.
[http://dx.doi.org/10.1145/1961189.1961199]
[48]
Lin, W.Z.; Fang, J.A.; Xiao, X.; Chou, K.C. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One, 2011, 6(9) e24756
[http://dx.doi.org/10.1371/journal.pone.0024756] [PMID: 21935457]
[49]
Kumar, K.K.; Pugalenthi, G.; Suganthan, P.N. DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest. J. Biomol. Struct. Dyn., 2009, 26(6), 679-686.
[http://dx.doi.org/10.1080/07391102.2009.10507281] [PMID: 19385697]
[50]
Liu, B.; Wang, S.; Wang, X. DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation. Sci. Rep., 2015, 5, 15479.
[http://dx.doi.org/10.1038/srep15479] [PMID: 26482832]
[51]
Xu, R.; Zhou, J.; Wang, H.; He, Y.; Wang, X.; Liu, B. Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation. BMC Syst. Biol., 2015, 9(Suppl. 1), S10.
[http://dx.doi.org/10.1186/1752-0509-9-S1-S10] [PMID: 25708928]
[52]
Lou, W.; Wang, X.; Chen, F.; Chen, Y.; Jiang, B.; Zhang, H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One, 2014, 9(1) e86703
[http://dx.doi.org/10.1371/journal.pone.0086703] [PMID: 24475169]