Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition

Liangwei	      Yang; Hui      Gao; Keyu	      Wu; Haotian	      Zhang; Changyu	      Li; Lixia      Tang

Abstract

Background: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy.

Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool.

Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins.

Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain.

Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.

Keywords: Cancerlectin, cascade LDA, g-gap tripeptide composition, SVM, protein, ANOVA.

Graphical Abstract

[1] 
Lotan R, Raz A. Lectins in cancer cells. Ann N Y Acad Sci  1988; 551(1): 385-96.
[http://dx.doi.org/10.1111/j.1749-6632.1988.tb22372.x PMID: 3072905] 
[2] 
Sharon N, Lis H. Lectins as cell recognition molecules. Science  1989; 246(4927): 227-34.
[http://dx.doi.org/10.1126/science.2552581] [PMID:  2552581] 
[3] 
Hu S, Wong DT. Lectin microarray. Proteomics Clin Appl  2009; 3(2): 148-54.
[http://dx.doi.org/10.1002/prca.200800153] [PMID:  21132067] 
[4] 
Sharon N. Lectins: properties, functions and applications in biology and medicine. Kitasato Medicine  1986; 18: 109-10.
[5] 
Beuth J, Ko HL, Pulverer G, Uhlenbruck G, Pichlmaier H. Importance of lectins for the prevention of bacterial infections and cancer metastases. Glycoconj J  1995; 12(1): 1-6.
[http://dx.doi.org/10.1007/BF00731862] [PMID:  7795408] 
[6] 
Bevilacqua MP, Nelson RM. Selectins. J Clin Invest  1993; 91(2): 379-87.
[http://dx.doi.org/10.1172/JCI116210] [PMID:  7679406] 
[7] 
Jamal S, Lavanya V, Adil AM, Ahmed N. Lectins-the promising cancer therapeutics. Oncobiol Targets  2014; 1(1): 12.
[http://dx.doi.org/10.4103/2395-4469.145348] 
[8] 
Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin  2011; 61(2): 69-90.
[http://dx.doi.org/10.3322/caac.20107] [PMID:  21296855] 
[9] 
Sherwani AF, Mohmood S, Khan F, Khan RH, Azfer MA. Characterization of lectins and their specificity in carcinomas-An appraisal. Indian J Clin Biochem  2003; 18(2): 169-80.
[http://dx.doi.org/10.1007/BF02867384] [PMID:  23105409] 
[10] 
Liu FT, Rabinovich GA. Galectins as modulators of tumour progression. Nat Rev Cancer  2005; 5(1): 29-41.
[http://dx.doi.org/10.1038/nrc1527] [PMID:  15630413] 
[11] 
Gorelik E, Galili U, Raz A. On the role of cell surface carbohydrates and their binding proteins (lectins) in tumor metastasis. Cancer Metastasis Rev  2001; 20(3-4): 245-77.
[http://dx.doi.org/10.1023/A:1015535427597] [PMID:  12085965] 
[12] 
Young LS, Searle PF, Onion D, Mautner V. Viral gene therapy strategies: from basic science to clinical application. J Pathol  2006; 208(2): 299-318.
[http://dx.doi.org/10.1002/path.1896] [PMID:  16362990] 
[13] 
Huang LH, Yan QJ, Kopparapu NK, Jiang ZQ, Sun Y. Astragalus membranaceus lectin (AML) induces caspase-dependent apoptosis in human leukemia cells. Cell Prolif  2012; 45(1): 15-21.
[http://dx.doi.org/10.1111/j.1365-2184.2011.00800.x PMID: 22172162] 
[14] 
Lin P, Ye X, Ng T. Purification of melibiose-binding lectins from two cultivars of Chinese black soybeans. Acta Biochim Biophys Sin (Shanghai)  2008; 40(12): 1029-38.
[http://dx.doi.org/10.1111/j.1745-7270.2008.00488.x PMID: 19089301] 
[15] 
Choi SH, Lyu SY, Park WB. Mistletoe lectin induces apoptosis and telomerase inhibition in human A253 cancer cells through dephosphorylation of Akt. Arch Pharm Res  2004; 27(1): 68-76.
[http://dx.doi.org/10.1007/BF02980049] [PMID:  14969342] 
[16] 
Kumar R, Panwar B, Chauhan JS, Raghava GP. Analysis and prediction of cancerlectins using evolutionary and domain information. BMC Res Notes  2011; 4(1): 237.
[http://dx.doi.org/10.1186/1756-0500-4-237] [PMID:  21774797] 
[17] 
Lin H, Liu WX, He J, Liu XH, Ding H, Chen W. Predicting cancerlectins by the optimal g-gap dipeptides. Sci Rep  2015; 5: 16964.
[http://dx.doi.org/10.1038/srep16964] [PMID:  26648527] 
[18] 
Zhang J, Ju Y, Lu H, Xuan P, Zou Q. Accurate identification of cancerlectins through hybrid machine learning technology. Int J Genomics  2016; (4): 1-11.
[http://dx.doi.org/10.1155/2016/7604641] 
[19] 
Lai H-Y, Chen X-X, Chen W, Tang H, Lin H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget  2017; 8(17): 28169-75.
[http://dx.doi.org/10.18632/oncotarget.15963] [PMID:  28423655] 
[20] 
Yang R, Zhang C, Zhang L, Gao R. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Res Int  2018; 2018(1): 1-10.
[http://dx.doi.org/10.1155/2018/9364182] 
[21] 
Damodaran D, Jeyakani J, Chauhan A, Kumar N, Chandra NR, Surolia A. CancerLectinDB: a database of lectins relevant to cancer. Glycoconj J  2008; 25(3): 191-8.
[http://dx.doi.org/10.1007/s10719-007-9085-5] [PMID:  18038206] 
[22] 
Apweiler R, Bairoch A, Wu CH, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res  2004; 32(Database issue): D115-9.
[http://dx.doi.org/10.1093/nar/gkh131] 
[23] 
Lobo I. Basic Local Alignment Search Tool (BLAST). J Mol Biol  2012; 215(3): 403-10.
[PMID:  22684147] 
[24] 
Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J Mol Biol  1994; 238(1): 54-61.
[http://dx.doi.org/10.1006/jmbi.1994.1267] [PMID:  8145256] 
[25] 
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins  2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID:  11288174] 
[26] 
Mei J, Zhao J. Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou’s general pseudo amino acid composition and motif features. J Theor Biol  2018; 447: 147-53.
[http://dx.doi.org/10.1016/j.jtbi.2018.03.034] [PMID:  29596863] 
[27] 
Muthu Krishnan S. Using Chou’s general PseAAC to analyze the evolutionary relationship of Receptor Associated Proteins (RAP) with various folding patterns of protein domains. J Theor Biol  2018; 445: 62-74.
[http://dx.doi.org/10.1016/j.jtbi.2018.02.008] [PMID:  29476832] 
[28] 
Rahman MS, Shatabda S, Saha S, Kaykobad M, Rahman MS. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. J Theor Biol  2018; 452: 22-34.
[http://dx.doi.org/10.1016/j.jtbi.2018.05.006] [PMID:  29753757] 
[29] 
Chou KC. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteomics  2009; 6(4): 262-74.
[http://dx.doi.org/10.2174/157016409789973707] 
[30] 
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA  1995; 92(19): 8700-4.
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID:  7568000] 
[31] 
Wang H, Hu X. Accurate prediction of nuclear receptors with conjoint triad feature. BMC Bioinformatics  2015; 16(1): 402.
[http://dx.doi.org/10.1186/s12859-015-0828-1] [PMID:  26630876] 
[32] 
Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing  2016; 173: 346-54.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123] 
[33] 
Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst Biol  2016; 10(4): 114.
[http://dx.doi.org/10.1186/s12918-016-0353-5] [PMID:  28155714] 
[34] 
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst  2014; 10(8): 2229-35.
[http://dx.doi.org/10.1039/C4MB00316K] [PMID:  24931825] 
[35] 
Ding H, Guo SH, Deng EZ, et al. Prediction of Golgi-resident protein types by using feature selection technique. Chemom Intell Lab Syst  2013; 124(6): 9-13.
[http://dx.doi.org/10.1016/j.chemolab.2013.03.005] 
[36] 
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids  2015; 47(2): 329-33.
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID:  25385313] 
[37] 
Lin H, Chen W, Ding H. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS One  2013; 8(10)e75726
[http://dx.doi.org/10.1371/journal.pone.0075726] [PMID:  24130738] 
[38] 
Ling Y, Yin X, Bhandarkar SM. Sirface vs. Fisherface: recognition
using class specific linear projection  International Conference on Image Processing 2003 ICIP Proceedings 2003.
[39] 
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell  2007; 29(1): 40-51.
[http://dx.doi.org/10.1109/TPAMI.2007.250598] [PMID:  17108382] 
[40] 
Yang J, Zhang L, Yang JY, Zhang D. From classifiers to discriminators: A nearest neighbor rule induced discriminant analysis. Pattern Recognit  2011; 44(7): 1387-402.
[http://dx.doi.org/10.1016/j.patcog.2011.01.009] 
[41] 
Jin Z, Yang JY, Hu ZS, Lou Z. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognit  2001; 34(7): 1405-16.
[http://dx.doi.org/10.1016/S0031-3203(00)00084-4] 
[42] 
Wang S, Gu X, Lu J, Yang JY, Wang R, Yang J, Eds. Unsupervised Discriminant Canonical Correlation Analysis for Feature Fusion. International Conference on Pattern Recognition. 
[43] 
Gu X, Liu C, Wang S, Zhao C. Feature extraction using adaptive slow feature discriminant analysis. Neurocomputing  2015; 154(C): 139-48.
[http://dx.doi.org/10.1016/j.neucom.2014.12.010] 
[44] 
Feng P-M, Chen W, Lin H, Chou K-C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem  2013; 442(1): 118-25.
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID:  23756733] 
[45] 
Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell  2002; 19(7): 711-20.
[http://dx.doi.org/10.1109/34.598228] 
[46] 
Pami IT, Kingravi HA. Face Recognition Using Laplacianfaces 2005.
[47] 
He X, Cai D, Yan S, Zhang HJ, Eds. Neighborhood Preserving Embedding. Tenth IEEE International Conference on Computer Vision. 
[48] 
Sugiyama M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res  2007; 8(1): 1027-61.
[49] 
Zhang N, Yu S, Guo Y, Wang L, Wang P, Feng Y. Discriminating Ramos and Jurkat cells with image textures from diffraction imaging flow cytometry based on a support vector machine. Curr Bioinform  2018; 13: 50-6.
[http://dx.doi.org/10.2174/1574893611666160608102537] 
[50] 
Li D, Ju Y, Zou Q. Protein folds prediction with hierarchical structured SVM. Curr Proteomics  2016; 13(2): 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940] 
[51] 
Wang SP, Zhang Q, Lu J, Cai YD. Analysis and prediction of nitrated tyrosine sites with the mRMR method and support vector machine algorithm. Curr Bioinform  2018; 13(1): 3-13.
[http://dx.doi.org/10.2174/1574893611666160608075753] 
[52] 
Chen W, Yang H, Feng P, Ding H, Lin H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics  2017; 33(22): 3518-23.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID:  28961687] 
[53] 
Chen W, Xing P, Zou Q. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci Rep  2017; 7: 40242.
[http://dx.doi.org/10.1038/srep40242] [PMID:  28079126] 
[54] 
Wang X, Zhong Y, Eds. Statistical learning theory and state of the art in SVM. IEEE International Conference on Cognitive Informatics. 
[55] 
Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst (Lond)  2010; 135(2): 230-67.
[http://dx.doi.org/10.1039/B918972F] [PMID:  20098757] 
[56] 
Chen XX, Hua T, Li WC, et al. Identification of Bacterial Cell Wall Lyases via Pseudo amino acid composition. Analyst (Lond)  2016; 135(2): 230-67.
[http://dx.doi.org/10.1155/2016/1654623] 
[57] 
Chou KC, Zhang CT. Prediction of Protein Structural Classes. Crit Rev Biochem Mol Biol  2008; •••: 275-349.
[58] 
Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Base Syst  2019; 163: 787-93.
[59] 
Su ZD, Huang Y, Zhang ZY, et al. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics  2018; 34(24): 4196-204.
[http://dx.doi.org/10.1093/bioinformatics/bty508] [PMID:  29931187] 
[60] 
Yang H, Tang H, Chen XX, et al. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int  2016; 20165413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID:  27597968] 
[61] 
Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol Biosyst  2016; 12(4): 1269-75.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID:  26883492] 
[62] 
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med  2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID:  24062796] 
[63] 
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med  2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID:  23762187] 
[64] 
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol  2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID:  21168420] 
[65] 
Yang H, Qiu WR, Liu G, et al. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci  2018; 14(8): 883-91.
[http://dx.doi.org/10.7150/ijbs.24616] [PMID:  29989083] 
[66] 
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: a sequence-based predictor for identifying 2′-O-Methylation Sites in Homo sapiens. J Comput Biol  2018; 25(11): 1266-77.
[67] 
Tang H, Zhao YW, Zou P, et al. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci  2018; 14(8): 957-64.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID:  29989085] 
[68] 
He W, Jia C, Duan Y, Zou Q. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol  2018; 12(Suppl. 4): 44.
[http://dx.doi.org/10.1186/s12918-018-0570-1] [PMID:  29745856] 
[69] 
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics  2019; •••
[PMID:  30247625] [http://dx.doi.org/10.1093/bioinformatics/bty827]] 
[70] 
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics  2019; 35(12): 2075-83.
[PMID:  30428009] 
[71] 
Zhang T, Tan P, Wang L, et al. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res  2017; 45(D1): D135-8.
[PMID:  27543076] 
[72] 
Yi Y, Zhao Y, Li C, et al. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res  2017; 45(D1): D115-8.
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID:  27899615] 
[73] 
Tang H, Zhang CM, Chen R, Huang P, Duan CG, Zou P. Identification of Secretory Proteins of Malaria Parasite by Feature Selection Technique. Lett Org Chem  2017; 14(9): 621-4.
[http://dx.doi.org/10.2174/1570178614666170329155502] 
[74] 
Liang ZY, Lai HY, Yang H, et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics  2017; 33(3): 467-9.
[PMID:  28171531] 
[75] 
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics  2015; 31(1): 119-20.
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID:  25231908] 
[76] 
Tang W, Wan S, Yang Z, Teschendorff AE, Zou Q. Tumor origin detection with tissue-specific miRNA and DNA methylation markers. Bioinformatics  2018; 34(3): 398-406.
[http://dx.doi.org/10.1093/bioinformatics/btx622] [PMID:  29028927] 
[77] 
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics  2018; 34(12): 2029-36.
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID:  29420699] 

Cite As

Current Bioinformatics

Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition

Abstract

Graphical Abstract