Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition

Page: [528 - 537] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Background: Lectins are a diverse group of glycoproteins or glycoconjugate proteins that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins, which play a key role in the process of tumor cells interacting with each other and are being employed as therapeutic agents. A full understanding of cancerlectins is significant because it provides a tool for the future direction of cancer therapy.

Objective: To develop an accurate and practically useful timesaving tool to identify cancerlectins. A novel sequence-based method is proposed along with a correlative webserver to access the proposed tool.

Methods: Firstly, protein features were extracted in a newly feature building way termed, g-gap tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA) as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier to identify cancerlectins.

Results: The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features in jackknife cross validation, the result of which is superior to other published methods in this domain.

Conclusion: In this study, a new method based only on primary structure of protein is proposed and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess webserver is made available in this work to facilitate other related works.

Keywords: Cancerlectin, cascade LDA, g-gap tripeptide composition, SVM, protein, ANOVA.

Graphical Abstract

[1]
Lotan R, Raz A. Lectins in cancer cells. Ann N Y Acad Sci 1988; 551(1): 385-96.
[http://dx.doi.org/10.1111/j.1749-6632.1988.tb22372.x PMID: 3072905]
[2]
Sharon N, Lis H. Lectins as cell recognition molecules. Science 1989; 246(4927): 227-34.
[http://dx.doi.org/10.1126/science.2552581] [PMID: 2552581]
[3]
Hu S, Wong DT. Lectin microarray. Proteomics Clin Appl 2009; 3(2): 148-54.
[http://dx.doi.org/10.1002/prca.200800153] [PMID: 21132067]
[4]
Sharon N. Lectins: properties, functions and applications in biology and medicine. Kitasato Medicine 1986; 18: 109-10.
[5]
Beuth J, Ko HL, Pulverer G, Uhlenbruck G, Pichlmaier H. Importance of lectins for the prevention of bacterial infections and cancer metastases. Glycoconj J 1995; 12(1): 1-6.
[http://dx.doi.org/10.1007/BF00731862] [PMID: 7795408]
[6]
Bevilacqua MP, Nelson RM. Selectins. J Clin Invest 1993; 91(2): 379-87.
[http://dx.doi.org/10.1172/JCI116210] [PMID: 7679406]
[7]
Jamal S, Lavanya V, Adil AM, Ahmed N. Lectins-the promising cancer therapeutics. Oncobiol Targets 2014; 1(1): 12.
[http://dx.doi.org/10.4103/2395-4469.145348]
[8]
Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin 2011; 61(2): 69-90.
[http://dx.doi.org/10.3322/caac.20107] [PMID: 21296855]
[9]
Sherwani AF, Mohmood S, Khan F, Khan RH, Azfer MA. Characterization of lectins and their specificity in carcinomas-An appraisal. Indian J Clin Biochem 2003; 18(2): 169-80.
[http://dx.doi.org/10.1007/BF02867384] [PMID: 23105409]
[10]
Liu FT, Rabinovich GA. Galectins as modulators of tumour progression. Nat Rev Cancer 2005; 5(1): 29-41.
[http://dx.doi.org/10.1038/nrc1527] [PMID: 15630413]
[11]
Gorelik E, Galili U, Raz A. On the role of cell surface carbohydrates and their binding proteins (lectins) in tumor metastasis. Cancer Metastasis Rev 2001; 20(3-4): 245-77.
[http://dx.doi.org/10.1023/A:1015535427597] [PMID: 12085965]
[12]
Young LS, Searle PF, Onion D, Mautner V. Viral gene therapy strategies: from basic science to clinical application. J Pathol 2006; 208(2): 299-318.
[http://dx.doi.org/10.1002/path.1896] [PMID: 16362990]
[13]
Huang LH, Yan QJ, Kopparapu NK, Jiang ZQ, Sun Y. Astragalus membranaceus lectin (AML) induces caspase-dependent apoptosis in human leukemia cells. Cell Prolif 2012; 45(1): 15-21.
[http://dx.doi.org/10.1111/j.1365-2184.2011.00800.x PMID: 22172162]
[14]
Lin P, Ye X, Ng T. Purification of melibiose-binding lectins from two cultivars of Chinese black soybeans. Acta Biochim Biophys Sin (Shanghai) 2008; 40(12): 1029-38.
[http://dx.doi.org/10.1111/j.1745-7270.2008.00488.x PMID: 19089301]
[15]
Choi SH, Lyu SY, Park WB. Mistletoe lectin induces apoptosis and telomerase inhibition in human A253 cancer cells through dephosphorylation of Akt. Arch Pharm Res 2004; 27(1): 68-76.
[http://dx.doi.org/10.1007/BF02980049] [PMID: 14969342]
[16]
Kumar R, Panwar B, Chauhan JS, Raghava GP. Analysis and prediction of cancerlectins using evolutionary and domain information. BMC Res Notes 2011; 4(1): 237.
[http://dx.doi.org/10.1186/1756-0500-4-237] [PMID: 21774797]
[17]
Lin H, Liu WX, He J, Liu XH, Ding H, Chen W. Predicting cancerlectins by the optimal g-gap dipeptides. Sci Rep 2015; 5: 16964.
[http://dx.doi.org/10.1038/srep16964] [PMID: 26648527]
[18]
Zhang J, Ju Y, Lu H, Xuan P, Zou Q. Accurate identification of cancerlectins through hybrid machine learning technology. Int J Genomics 2016; (4): 1-11.
[http://dx.doi.org/10.1155/2016/7604641]
[19]
Lai H-Y, Chen X-X, Chen W, Tang H, Lin H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget 2017; 8(17): 28169-75.
[http://dx.doi.org/10.18632/oncotarget.15963] [PMID: 28423655]
[20]
Yang R, Zhang C, Zhang L, Gao R. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Res Int 2018; 2018(1): 1-10.
[http://dx.doi.org/10.1155/2018/9364182]
[21]
Damodaran D, Jeyakani J, Chauhan A, Kumar N, Chandra NR, Surolia A. CancerLectinDB: a database of lectins relevant to cancer. Glycoconj J 2008; 25(3): 191-8.
[http://dx.doi.org/10.1007/s10719-007-9085-5] [PMID: 18038206]
[22]
Apweiler R, Bairoch A, Wu CH, et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 2004; 32(Database issue): D115-9.
[http://dx.doi.org/10.1093/nar/gkh131]
[23]
Lobo I. Basic Local Alignment Search Tool (BLAST). J Mol Biol 2012; 215(3): 403-10.
[PMID: 22684147]
[24]
Nakashima H, Nishikawa K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J Mol Biol 1994; 238(1): 54-61.
[http://dx.doi.org/10.1006/jmbi.1994.1267] [PMID: 8145256]
[25]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[26]
Mei J, Zhao J. Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou’s general pseudo amino acid composition and motif features. J Theor Biol 2018; 447: 147-53.
[http://dx.doi.org/10.1016/j.jtbi.2018.03.034] [PMID: 29596863]
[27]
Muthu Krishnan S. Using Chou’s general PseAAC to analyze the evolutionary relationship of Receptor Associated Proteins (RAP) with various folding patterns of protein domains. J Theor Biol 2018; 445: 62-74.
[http://dx.doi.org/10.1016/j.jtbi.2018.02.008] [PMID: 29476832]
[28]
Rahman MS, Shatabda S, Saha S, Kaykobad M, Rahman MS. DPP-PseAAC: a DNA-binding protein prediction model using Chou’s general PseAAC. J Theor Biol 2018; 452: 22-34.
[http://dx.doi.org/10.1016/j.jtbi.2018.05.006] [PMID: 29753757]
[29]
Chou KC. Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteomics 2009; 6(4): 262-74.
[http://dx.doi.org/10.2174/157016409789973707]
[30]
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 1995; 92(19): 8700-4.
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
[31]
Wang H, Hu X. Accurate prediction of nuclear receptors with conjoint triad feature. BMC Bioinformatics 2015; 16(1): 402.
[http://dx.doi.org/10.1186/s12859-015-0828-1] [PMID: 26630876]
[32]
Zou Q, Zeng J, Cao L, Ji R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 2016; 173: 346-54.
[http://dx.doi.org/10.1016/j.neucom.2014.12.123]
[33]
Zou Q, Wan S, Ju Y, Tang J, Zeng X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst Biol 2016; 10(4): 114.
[http://dx.doi.org/10.1186/s12918-016-0353-5] [PMID: 28155714]
[34]
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst 2014; 10(8): 2229-35.
[http://dx.doi.org/10.1039/C4MB00316K] [PMID: 24931825]
[35]
Ding H, Guo SH, Deng EZ, et al. Prediction of Golgi-resident protein types by using feature selection technique. Chemom Intell Lab Syst 2013; 124(6): 9-13.
[http://dx.doi.org/10.1016/j.chemolab.2013.03.005]
[36]
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2015; 47(2): 329-33.
[http://dx.doi.org/10.1007/s00726-014-1862-4] [PMID: 25385313]
[37]
Lin H, Chen W, Ding H. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS One 2013; 8(10)e75726
[http://dx.doi.org/10.1371/journal.pone.0075726] [PMID: 24130738]
[38]
Ling Y, Yin X, Bhandarkar SM. Sirface vs. Fisherface: recognition using class specific linear projection International Conference on Image Processing 2003 ICIP Proceedings 2003.
[39]
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 2007; 29(1): 40-51.
[http://dx.doi.org/10.1109/TPAMI.2007.250598] [PMID: 17108382]
[40]
Yang J, Zhang L, Yang JY, Zhang D. From classifiers to discriminators: A nearest neighbor rule induced discriminant analysis. Pattern Recognit 2011; 44(7): 1387-402.
[http://dx.doi.org/10.1016/j.patcog.2011.01.009]
[41]
Jin Z, Yang JY, Hu ZS, Lou Z. Face recognition based on the uncorrelated discriminant transformation. Pattern Recognit 2001; 34(7): 1405-16.
[http://dx.doi.org/10.1016/S0031-3203(00)00084-4]
[42]
Wang S, Gu X, Lu J, Yang JY, Wang R, Yang J, Eds. Unsupervised Discriminant Canonical Correlation Analysis for Feature Fusion. International Conference on Pattern Recognition.
[43]
Gu X, Liu C, Wang S, Zhao C. Feature extraction using adaptive slow feature discriminant analysis. Neurocomputing 2015; 154(C): 139-48.
[http://dx.doi.org/10.1016/j.neucom.2014.12.010]
[44]
Feng P-M, Chen W, Lin H, Chou K-C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 2013; 442(1): 118-25.
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733]
[45]
Belhumeur PN, Hespanha JP, Kriegman DJ. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 2002; 19(7): 711-20.
[http://dx.doi.org/10.1109/34.598228]
[46]
Pami IT, Kingravi HA. Face Recognition Using Laplacianfaces 2005.
[47]
He X, Cai D, Yan S, Zhang HJ, Eds. Neighborhood Preserving Embedding. Tenth IEEE International Conference on Computer Vision.
[48]
Sugiyama M. Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. J Mach Learn Res 2007; 8(1): 1027-61.
[49]
Zhang N, Yu S, Guo Y, Wang L, Wang P, Feng Y. Discriminating Ramos and Jurkat cells with image textures from diffraction imaging flow cytometry based on a support vector machine. Curr Bioinform 2018; 13: 50-6.
[http://dx.doi.org/10.2174/1574893611666160608102537]
[50]
Li D, Ju Y, Zou Q. Protein folds prediction with hierarchical structured SVM. Curr Proteomics 2016; 13(2): 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940]
[51]
Wang SP, Zhang Q, Lu J, Cai YD. Analysis and prediction of nitrated tyrosine sites with the mRMR method and support vector machine algorithm. Curr Bioinform 2018; 13(1): 3-13.
[http://dx.doi.org/10.2174/1574893611666160608075753]
[52]
Chen W, Yang H, Feng P, Ding H, Lin H. iDNA4mC: identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics 2017; 33(22): 3518-23.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687]
[53]
Chen W, Xing P, Zou Q. Detecting N6-methyladenosine sites from RNA transcriptomes using ensemble Support Vector Machines. Sci Rep 2017; 7: 40242.
[http://dx.doi.org/10.1038/srep40242] [PMID: 28079126]
[54]
Wang X, Zhong Y, Eds. Statistical learning theory and state of the art in SVM. IEEE International Conference on Cognitive Informatics.
[55]
Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst (Lond) 2010; 135(2): 230-67.
[http://dx.doi.org/10.1039/B918972F] [PMID: 20098757]
[56]
Chen XX, Hua T, Li WC, et al. Identification of Bacterial Cell Wall Lyases via Pseudo amino acid composition. Analyst (Lond) 2016; 135(2): 230-67.
[http://dx.doi.org/10.1155/2016/1654623]
[57]
Chou KC, Zhang CT. Prediction of Protein Structural Classes. Crit Rev Biochem Mol Biol 2008; •••: 275-349.
[58]
Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Base Syst 2019; 163: 787-93.
[59]
Su ZD, Huang Y, Zhang ZY, et al. iLoc-lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics 2018; 34(24): 4196-204.
[http://dx.doi.org/10.1093/bioinformatics/bty508] [PMID: 29931187]
[60]
Yang H, Tang H, Chen XX, et al. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res Int 2016; 20165413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
[61]
Tang H, Chen W, Lin H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol Biosyst 2016; 12(4): 1269-75.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[62]
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med 2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[63]
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med 2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[64]
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[65]
Yang H, Qiu WR, Liu G, et al. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int J Biol Sci 2018; 14(8): 883-91.
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083]
[66]
Yang H, Lv H, Ding H, Chen W, Lin H. iRNA-2OM: a sequence-based predictor for identifying 2′-O-Methylation Sites in Homo sapiens. J Comput Biol 2018; 25(11): 1266-77.
[67]
Tang H, Zhao YW, Zou P, et al. HBPred: a tool to identify growth hormone-binding proteins. Int J Biol Sci 2018; 14(8): 957-64.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[68]
He W, Jia C, Duan Y, Zou Q. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst Biol 2018; 12(Suppl. 4): 44.
[http://dx.doi.org/10.1186/s12918-018-0570-1] [PMID: 29745856]
[69]
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics 2019; •••
[PMID: 30247625] [http://dx.doi.org/10.1093/bioinformatics/bty827]]
[70]
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2019; 35(12): 2075-83.
[PMID: 30428009]
[71]
Zhang T, Tan P, Wang L, et al. RNALocate: a resource for RNA subcellular localizations. Nucleic Acids Res 2017; 45(D1): D135-8.
[PMID: 27543076]
[72]
Yi Y, Zhao Y, Li C, et al. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res 2017; 45(D1): D115-8.
[http://dx.doi.org/10.1093/nar/gkw1052] [PMID: 27899615]
[73]
Tang H, Zhang CM, Chen R, Huang P, Duan CG, Zou P. Identification of Secretory Proteins of Malaria Parasite by Feature Selection Technique. Lett Org Chem 2017; 14(9): 621-4.
[http://dx.doi.org/10.2174/1570178614666170329155502]
[74]
Liang ZY, Lai HY, Yang H, et al. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics 2017; 33(3): 467-9.
[PMID: 28171531]
[75]
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 2015; 31(1): 119-20.
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID: 25231908]
[76]
Tang W, Wan S, Yang Z, Teschendorff AE, Zou Q. Tumor origin detection with tissue-specific miRNA and DNA methylation markers. Bioinformatics 2018; 34(3): 398-406.
[http://dx.doi.org/10.1093/bioinformatics/btx622] [PMID: 29028927]
[77]
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics 2018; 34(12): 2029-36.
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID: 29420699]