ML-rRBF-ECOC: A Multi-Label Learning Classifier for Predicting Protein Subcellular Localization with Both Single and Multiple Sites

Page: [359 - 365] Pages: 7

  • * (Excluding Mailing and Handling)

Abstract

Background: The subcellular localization of a protein is closely related with its functions and interactions. More and more evidences show that proteins may simultaneously exist at, or move between, two or more different subcellular localizations. Therefore, predicting protein subcellular localization is an important but challenging problem.

Observation: Most of the existing methods for predicting protein subcellular localization assume that a protein locates at a single site. Although a few methods have been proposed to deal with proteins with multiple sites, correlations between subcellular localization are not efficiently taken into account. In this paper, we propose an integrated method for predicting protein subcellular localizations with both single site and multiple sites.

Methods: Firstly, we extend the Multi-Label Radial Basis Function (ML-RBF) method to the regularized version, and augment the first layer of ML-RBF to take local correlations between subcellular localization into account. Secondly, we embed the modified ML-RBF into a multi-label Error-Correcting Output Codes (ECOC) method in order to further consider the subcellular localization dependency. We name our method ML-rRBF-ECOC. Finally, the performance of ML-rRBF-ECOC is evaluated on three benchmark datasets.

Results: The results demonstrate that ML-rRBF-ECOC has highly competitive performance to the related multi-label learning method and some state-of-the-art methods for predicting protein subcellular localizations with multiple sites. Considering dependency between subcellular localizations can contribute to the improvement of prediction performance.

Conclusion: This also indicates that correlations between different subcellular localizations really exist. Our method at least plays a complementary role to existing methods for predicting protein subcellular localizations with multiple sites.

Keywords: Subcellular localization, multiple sites, multi-label radial basis function, error-correcting output codes, multi-label, label correlations.

Graphical Abstract

[1]
Phair, R.D.; Misteli, T. High mobility of proteins in the mammalian cell nucleus. Nature, 2000, 404, 604-609.
[2]
Murphy, R.F.; Boland, M.V.; Velliste, M. In: proceedings of the eighth international conference on intelligent systems for molecular biology, La Jolla/San Diego, 19-23 August, 2000. Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. ISMB,, 2000, pp. 251-259.
[3]
Nakashima, H.; Nishikawa, K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol., 1994, 238, 54-61.
[4]
Cedano, J.; Aloy, P.; Pèrez-Pons, J.A.; Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol., 1997, 266, 594-600.
[5]
Emanuelsson, O.; Nielsen, H.; Brunak, S.; von Heijne, G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol., 1997, 300, 1005-1016.
[6]
Höglund, A.; Dönnes, P.; Blum, T.; Adolph, H.W.; Kohlbacher, O. MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics, 2006, 22, 1158-1165.
[7]
Emanuelsson, O.; Brunak, S.; von Heijne, G.; Nielsen, H. Locating proteins in the cell using TargetP, SignalP, and related tools. Nat. Protoc., 2007, 2(4), 953-971.
[8]
Wang, J.R.; Sung, W.K.; Krishnan, A.; Li, K.B. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics, 2005, 6, 174.
[9]
Pierleoni, A.; Martelli, P.L.; Fariselli, P.; Casadio, R. BaCelLo: a balanced subcellular localization predictor. Bioinformatics, 2006, 22, e408-e416.
[10]
Huang, W.L.; Tung, C.W.; Huang, H.L.; Hwang, S.F.; Ho, S.Y. ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. Biosystems, 2007, 90, 573-581.
[11]
Sarda, D.; Chua, G.H.; Li, K.B.; Krishnan, A. pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics, 2005, 6, 152.
[12]
Briesemeister, S.; Rahnenführer, J.; Kohlbacher, O. Going from where to why-interpretable prediction of protein subcellular localization. Bioinformatics, 2010, 26, 1232-1238.
[13]
Mei, S.Y.; Fei, W. Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics, 2010(Suppl. 1), S17.
[14]
Zheng, X.Q.; Liu, T.G.; Wang, J. A complexity-based method for predicting protein subcellular location. Amino Acids, 2009, 37, 427-433.
[15]
Yu, N.Y.; Wagner, J.R.; Laird, M.R.; Melli, G.; Rey, S.; Lo, R.; Dao, P.; Sahinalp, S.C.; Ester, M.; Foster, L.J.; Brinkman, F.S.L. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics, 2010, 26, 1608-1615.
[16]
Chou, K.C.; Cai, Y.D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem., 2002, 277, 45765-45769.
[17]
Chou, K.C.; Cai, Y.D. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. Biophys. Res. Commun., 2004, 320, 1236-1239.
[18]
Chou, K.C.; Shen, H.B. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One, 2010, 5e9931
[19]
Lei, Z.D.; Dai, Y. Assessing protein similarity with gene ontology and its use in subnuclear localization prediction. BMC Bioinformatics, 2006, 7, 491.
[20]
Mei, S.Y.; Fei, W.; Zhou, S.G. Gene ontology based transfer learning for protein subcellular localization. BMC Bioinformatics, 2011, 12, 44.
[21]
Chang, J.M.; Su, E.C.Y.; Lo, A.; Chiu, H.S.; Sung, T.Y.; Hsu, W.L. PSLDoc: protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins, 2008, 72, 693-710.
[22]
Guo, J.; Lin, Y.L. TSSub: eukaryotic protein subcellular localization by extracting features from profiles. Bioinformatics, 2006, 22, 1784-1785.
[23]
Mundra, P.; Kumar, M.; Kumar, K.K.; Jayaraman, V.K.; Kulkarni, B.D. Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit. Lett., 2007, 28, 1610-1615.
[24]
Shen, H.B.; Chou, K.C. Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng. Des. Sel., 2007, 20, 561-567.
[25]
Xiao, R.Q.; Guo, Y.Z.; Zeng, Y.H.; Tan, H.F.; Pu, X.M.; Li, M.L. Using position specific scoring matrix and auto covariance to predict protein subnuclear localization. J. Biomed. Sci. Eng., 2009, 2, 51-56.
[26]
Shin, C.J.; Wong, S.; Davis, M.J.; Ragan, M.A. Protein-protein interaction as a predictor of subcellular location. BMC Syst. Biol., 2009, 3, 28.
[27]
Cui, Q.H.; Jiang, T.Z.; Liu, B.; Ma, S.D. Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics, 2004, 5, 66.
[28]
Guda, C.; Subramaniam, S. TARGET: a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics, 2005, 21, 3963-3969.
[29]
Shen, H.B.; Chou, K.C. A top-down approach to enhance the power of predicting human protein subcellular localization: hum-mPLoc 2.0. Anal. Biochem., 2009, 394, 269-274.
[30]
Zhou, M.M.; Boekhorst, J.; Francke, C.; Siezen, R.J. LocateP: genome-scale subcellular-location predictor for bacterial proteins. BMC Bioinformatics, 2008, 9, 173.
[31]
Han, G.S.; Yu, Z.G.; Anh, V.; Krishnajith, A.P.D.; Tian, Y.C. An ensemble method for predicting subnuclear localizations from primary protein structures. PLoS One, 2013, 8e57225
[32]
Chou, K.C. Prediction of protein subcellular attributes using pseudo-amino acid composition. Proteins: Struct. Funct. Genet., 2001, 43, 246-255.
[33]
Foster, L.J.; de Hoog, C.L.; Zhang, Y.; Zhang, Y.; Xie, X. A mammalian organelle map by protein correlation profiling. Cell, 2006, 125, 187-199.
[34]
Chou, K.C.; Shen, H.B. Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc., 2008, 3, 153-162.
[35]
Wan, S.B.; Mak, M.W.; Kung, S.Y. mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines. BMC Bioinformatics, 2012, 13, 290.
[36]
Xiao, X.; Wu, Z.C.; Chou, K.C. A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One, 2011, 6e20592
[37]
Chou, K.C.; Shen, H.B. Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Nat. Sci., 2010, 2, 1090-1103.
[38]
Shen, H.B.; Chou, K.C. Virus-mPLoc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn., 2010, 28, 175-186.
[39]
Chou, K.C.; Shen, H.B. Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One, 2010, 5e11335
[40]
Shen, H.B.; Chou, K.C. Gneg-mPLoc: a top-down strategy to enhance the quality of predicting subcellular localization of Gram-negative bacterial proteins. J. Theor. Biol., 2010, 264, 326-333.
[41]
Bishop, C.M. Neural networks for pattern recognition; Oxford University Press: New York, 1995, pp. 1-482.
[42]
Zhang, M.L. ML-RBF: RBF neural networks for multi-label learning. Neural Process. Lett., 2009, 29, 61-74.
[43]
Zhang, M.L. A k-nearest neighbor based multi-instance multi-label learning algorithm. 2010 22nd IEEE Int. Conf. on Tools Artif. Intell., 2010.
[http://dx.doi.org/10.1109/ICTAI.2010.102]
[44]
Liu, J.; Ji, S.W.; Ye, J.P. Multi-task feature learning via efficient l2,1-norm minimization. Proc. Twenty-Fifth Conf. Uncertainty Artif. Intell, 2009, pp. 339-348.
[45]
Zhang, Y.; Schneider, J.G. Multi-label output codes using canonical correlation analysis. Int. Conf. Artif. Intell. Statistics, 2011, pp. 873-882.
[46]
Wang, X.; Li, G.Z.; Lu, W.C. Virus-ECC-mPLoc: a multi-label predictor for predicting the subcellular localization of virus proteins with both single and multiple sites based on a general form of Chou’s pseudo amino acid composition. Protein Pept. Lett., 2013, 20, 309-317.
[47]
He, J.; Gu, H.; Liu, W. Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PLoS One, 2012, 7e37155
[48]
Wang, X.; Li, G.Z. Multilabel learning via random label selection for protein subcellular multilocations prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2013, 10, 436-446.
c