RDR100: A Robust Computational Method for Identification of Krüppel-like Factors

Page: [584 - 599] Pages: 16

  • * (Excluding Mailing and Handling)

Abstract

Background: Krüppel-like factors (KLFs) are a family of transcription factors containing zinc fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins is crucial, given their involvement in important biological functions. Although experimental approaches can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.

Methods: In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten different features using a recursive feature elimination approach, and then trained their respective model using five distinct machine learning (ML) classifiers.

Results: The performance of all models was assessed using independent datasets, and RDR100 was selected as the final model based on its consistent performance in cross-validation and independent evaluation.

Conclusion: Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web server is available at https://procarb.org/RDR100/.

Graphical Abstract

[1]
Ilsley MD, Gillinder KR, Magor GW, et al. Krüppel-like factors compete for promoters and enhancers to fine-tune transcription. Nucleic Acids Res 2017; 45(11): 6572-88.
[http://dx.doi.org/10.1093/nar/gkx441] [PMID: 28541545]
[2]
McConnell BB, Yang VW. Mammalian Krüppel-like factors in health and diseases. Physiol Rev 2010; 90(4): 1337-81.
[http://dx.doi.org/10.1152/physrev.00058.2009] [PMID: 20959618]
[3]
Zhang Y, Yao C, Ju Z, et al. Krüppel-like factors in tumors: Key regulators and therapeutic avenues. Front Oncol 2023; 13: 1080720.
[http://dx.doi.org/10.3389/fonc.2023.1080720] [PMID: 36761967]
[4]
Tetreault MP, Yang Y, Katz JP. Krüppel-like factors in cancer. Nat Rev Cancer 2013; 13(10): 701-13.
[http://dx.doi.org/10.1038/nrc3582] [PMID: 24060862]
[5]
Pollak NM, Hoffman M, Goldberg IJ, Drosatos K. Krüppel-Like Factors. JACC Basic Transl Sci 2018; 3(1): 132-56.
[http://dx.doi.org/10.1016/j.jacbts.2017.09.001] [PMID: 29876529]
[6]
Oishi Y, Manabe I. Krüppel-like factors in metabolic homeostasis and cardiometabolic disease. Front Cardiovasc Med 2018; 5: 69.
[http://dx.doi.org/10.3389/fcvm.2018.00069] [PMID: 29942807]
[7]
Tian H, Qiao S, Zhao Y, et al. Krüppel-like transcription factor 7 is a causal gene in autism development. Int J Mol Sci 2022; 23(6): 3376.
[http://dx.doi.org/10.3390/ijms23063376] [PMID: 35328799]
[8]
Yang M, Guo Q, Peng H, et al. Krüppel-like factor 3 inhibition by mutated lncRNA Reg1cp results in human high bone mass syndrome. J Exp Med 2019; 216(8): 1944-64.
[http://dx.doi.org/10.1084/jem.20181554] [PMID: 31196982]
[9]
Shao M, Ge GZ, Liu WJ, et al. Characterization and phylogenetic analysis of Krüppel-like transcription factor (KLF) gene family in tree shrews (Tupaia belangeri chinensis). Oncotarget 2017; 8(10): 16325-39.
[http://dx.doi.org/10.18632/oncotarget.13883] [PMID: 28032601]
[10]
Bernhardt C, Sock E, Fröb F, Hillgärtner S, Nemer M, Wegner M. KLF9 and KLF13 transcription factors boost myelin gene expression in oligodendrocytes as partners of SOX10 and MYRF. Nucleic Acids Res 2022; 50(20): 11509-28.
[http://dx.doi.org/10.1093/nar/gkac953] [PMID: 36318265]
[11]
Paranjapye A, NandyMazumdar M, Harris A. Kruppel-like factor 5 regulates CFTR expression through repression by maintaining chromatin architecture coupled with direct enhancer activation. J Mol Biol 2022; 434.
[12]
Cao Z, Sun X, Icli B, Wara AK, Feinberg MW. Role of Krüppel-like factors in leukocyte development, function, and disease. Blood 2010; 116(22): 4404-14.
[http://dx.doi.org/10.1182/blood-2010-05-285353] [PMID: 20616217]
[13]
Preiss A, Rosenberg UB, Kienlin A, Seifert E, Jäckle H. Molecular genetics of Krüppel, a gene required for segmentation of the Drosophila embryo. Nature 1985; 313(5997): 27-32.
[http://dx.doi.org/10.1038/313027a0] [PMID: 3917552]
[14]
Brayer KJ, Segal DJ. Keep your fingers off my DNA: Protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem Biophys 2008; 50(3): 111-31.
[http://dx.doi.org/10.1007/s12013-008-9008-5] [PMID: 18253864]
[15]
Kadonaga JT, Carner KR, Masiarz FR, Tjian R. Isolation of cDNA encoding transcription factor Sp1 and functional analysis of the DNA binding domain. Cell 1987; 51(6): 1079-90.
[http://dx.doi.org/10.1016/0092-8674(87)90594-0] [PMID: 3319186]
[16]
Kaczynski J, Cook T, Urrutia R. Sp1- and Krüppel-like transcription factors. Genome Biol 2003; 4(2): 206.
[http://dx.doi.org/10.1186/gb-2003-4-2-206] [PMID: 12620113]
[17]
Chang Z, Li H. KLF9 deficiency protects the heart from inflammatory injury triggered by myocardial infarction. Korean J Physiol Pharmacol 2023; 27(2): 177-85.
[http://dx.doi.org/10.4196/kjpp.2023.27.2.177] [PMID: 36815257]
[18]
Zhou X, Kang Y, Chang Y, et al. CRC therapy identifies indian hedgehog signaling in mouse endometrial epithelial cells and inhibition of Ihh-KLF9 as a novel strategy for treating IUA. Cells 2022; 11(24): 4053.
[http://dx.doi.org/10.3390/cells11244053] [PMID: 36552817]
[19]
Pernaa N, Keskitalo S, Chowdhury I, et al. Heterozygous premature termination in zinc-finger domain of Krüppel-like factor 2 gene associates with dysregulated immunity. Front Immunol 2022; 13: 819929.
[http://dx.doi.org/10.3389/fimmu.2022.819929] [PMID: 36466816]
[20]
Zhou C, Sun P, Hamblin MH, Yin KJ. Genetic deletion of Krüppel-like factor 11 aggravates traumatic brain injury. J Neuroinflammation 2022; 19(1): 281.
[http://dx.doi.org/10.1186/s12974-022-02638-0] [PMID: 36403074]
[21]
Chen Z, Lei T, Chen X, et al. Porcine KLF gene family: Structure, mapping, and phylogenetic analysis. Genomics 2010; 95(2): 111-9.
[http://dx.doi.org/10.1016/j.ygeno.2009.11.001] [PMID: 19941950]
[22]
Hu F, Ren Y, Wang Z, et al. Bioinformatics analysis of KLF2 as a potential prognostic factor in ccRCC and association with epithelial mesenchymal transition. Exp Ther Med 2022; 24(3): 561.
[http://dx.doi.org/10.3892/etm.2022.11498] [PMID: 35978925]
[23]
Safi S, Badshah Y, Shabbir M, et al. Predicting 3D structure, cross talks, and prognostic significance of klf9 in cervical cancer. Front Oncol 2022; 11: 797007.
[http://dx.doi.org/10.3389/fonc.2021.797007] [PMID: 35047407]
[24]
Le NQK, Do DT, Nguyen TTD, Le QA. A sequence-based prediction of Kruppel-like factors proteins using XGBoost and optimized features. Gene 2021; 787: 145643.
[http://dx.doi.org/10.1016/j.gene.2021.145643] [PMID: 33848577]
[25]
Rose PW, Prlić A, Altunkaya A, et al. The RCSB protein data bank: Integrative view of protein, gene and 3D structural information. Nucleic Acids Res 2017; 45(D1): D271-81.
[PMID: 27794042]
[26]
O’Leary NA, Wright MW, Brister JR, et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44(D1): D733-45.
[http://dx.doi.org/10.1093/nar/gkv1189] [PMID: 26553804]
[27]
Bateman A, Martin MJ, Orchard S. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res 2022; 49(D1): D480-9.
[28]
Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22(13): 1658-9.
[http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID: 16731699]
[29]
Xiao N, Cao DS, Zhu MF, Xu QS. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 2015; 31(11): 1857-9.
[http://dx.doi.org/10.1093/bioinformatics/btv042] [PMID: 25619996]
[30]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
[31]
Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 2005; 21(1): 10-9.
[http://dx.doi.org/10.1093/bioinformatics/bth466] [PMID: 15308540]
[32]
Chen C, Zhang Q, Ma Q, Yu B. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom Intell Lab Syst 2019; 191: 54-64.
[http://dx.doi.org/10.1016/j.chemolab.2019.06.003]
[33]
Govindarajan S, Recabarren R, Goldstein RA. Estimating the total number of protein folds. Proteins 1999; 35(4): 408-14.
[http://dx.doi.org/10.1002/(SICI)1097-0134(19990601)35:4<408::AID-PROT4>3.0.CO;2-A] [PMID: 10382668]
[34]
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci 1995; 92(19): 8700-4.
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
[35]
Malik A, Subramaniyam S, Kim CB, Manavalan B. SortPred: The first machine learning based predictor to identify bacterial sortases and their classes using sequence-derived information. Comput Struct Biotechnol J 2022; 20: 165-74.
[http://dx.doi.org/10.1016/j.csbj.2021.12.014] [PMID: 34976319]
[36]
Malik A, Mahajan N, Dar TA, Kim CB. C10Pred: A first machine learning based tool to predict C10 family cysteine peptidases using sequence-derived features. Int J Mol Sci 2022; 23(17): 9518.
[http://dx.doi.org/10.3390/ijms23179518] [PMID: 36076915]
[37]
Firoz A, Malik A, Ali HM, Akhter Y, Manavalan B, Kim CB. PRR-HyPred: A two-layer hybrid framework to predict pattern recognition receptors and their families by employing sequence encoded optimal features. Int J Biol Macromol 2023; 234: 123622.
[http://dx.doi.org/10.1016/j.ijbiomac.2023.123622] [PMID: 36773859]
[38]
Shen J, Zhang J, Luo X, et al. Predicting protein–protein interactions based only on sequences information. Proc Natl Acad Sci 2007; 104(11): 4337-41.
[http://dx.doi.org/10.1073/pnas.0607879104] [PMID: 17360525]
[39]
Yang N, Pei Y, Wang Y, Zhao L, Zhao P, Li Z. Identifying the antioxidant activity of tripeptides based on sequence information and machine learning. Chemom Intell Lab Syst 2023; 238: 104845.
[http://dx.doi.org/10.1016/j.chemolab.2023.104845]
[40]
Chou KC. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 2000; 278(2): 477-83.
[http://dx.doi.org/10.1006/bbrc.2000.3815] [PMID: 11097861]
[41]
Dong J, Zhu MF, Yun YH, Lu AP, Hou TJ, Cao DS. BioMedR: An R/CRAN package for integrated data analysis pipeline in biomedical study. Brief Bioinform 2021; 22(1): 474-84.
[http://dx.doi.org/10.1093/bib/bbz150] [PMID: 31885044]
[42]
Akbar S, Rahman AU, Hayat M, Sohail M. cACP: Classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components. Chemom Intell Lab Syst 2020; 196: 103912.
[http://dx.doi.org/10.1016/j.chemolab.2019.103912]
[43]
Ong SAK, Lin HH, Chen YZ, Li ZR, Cao Z. Efficacy of different protein descriptors in predicting protein functional families. BMC Bioinformatics 2007; 8(1): 300.
[http://dx.doi.org/10.1186/1471-2105-8-300] [PMID: 17705863]
[44]
van den Berg BA, Reinders MJT, Roubos JA, Ridder D. SPiCE: A web-based tool for sequence-based protein classification and exploration. BMC Bioinformatics 2014; 15(1): 93.
[http://dx.doi.org/10.1186/1471-2105-15-93] [PMID: 24685258]
[45]
Kuhn M. Building predictive models in r using the caret package. J Stat Softw 2008; 28(5): 1-26.
[http://dx.doi.org/10.18637/jss.v028.i05]
[46]
Ahmad A, Akbar S, Hayat M, Ali F, Khan S, Sohail M. Identification of antioxidant proteins using a discriminative intelligent model of k-space amino acid pairs based descriptors incorporating with ensemble feature selection. Biocybern Biomed Eng 2022; 42(2): 727-35.
[http://dx.doi.org/10.1016/j.bbe.2020.10.003]
[47]
Shen H, Chou KC. Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. Biochem Biophys Res Commun 2005; 334(1): 288-92.
[http://dx.doi.org/10.1016/j.bbrc.2005.06.087] [PMID: 16002049]
[48]
Akkus A, Güvenir HA. K nearest neighbor classification on feature projections. Proceedings of the Thirteenth International Conference on International Conference on Machine Learning 1996;. 12-9.
[49]
Ahmed S, Arif M, Kabir M. PredAoDP: Accurate identification of antioxidant proteins by fusing different descriptors based on evolutionary information with support vector machine. Chemom Intell Lab Syst 2022; 228: 104623.
[50]
Rish I. An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence. 2001; pp. 41-6.
[51]
Abbas Z, Tayara H, Chong KT. Alzheimer’s disease prediction based on continuous feature representation using multi-omics data integration. Chemom Intell Lab Syst 2022; 223: 104536.
[http://dx.doi.org/10.1016/j.chemolab.2022.104536]
[52]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[53]
Jo T, Cheng J. Improving protein fold recognition by random forest. BMC Bioinformatics 2014; 15(S11) (Suppl. 11): S14.
[http://dx.doi.org/10.1186/1471-2105-15-S11-S14] [PMID: 25350499]
[54]
Li J, Wu J, Chen K. PFP-RFSM: Protein fold prediction by using random forests and sequence motifs. J Biomed Sci Eng 2013; 6(12): 1161-70.
[http://dx.doi.org/10.4236/jbise.2013.612145]
[55]
Waris M, Ahmad K, Kabir M, Hayat M. Identification of DNA binding proteins using evolutionary profiles position specific scoring matrix. Neurocomputing 2016; 199: 154-62.
[http://dx.doi.org/10.1016/j.neucom.2016.03.025]
[56]
Ma X, Guo J, Sun X. DNABP: Identification of DNA-Binding proteins based on feature selection using a random forest and predicting binding residues. PLoS One 2016; 11(12): e0167345.
[http://dx.doi.org/10.1371/journal.pone.0167345] [PMID: 27907159]
[57]
Hayat M, Khan A, Yeasin M. Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids 2012; 42(6): 2447-60.
[http://dx.doi.org/10.1007/s00726-011-1053-5] [PMID: 21850437]
[58]
Sabooh MF, Iqbal N, Khan M, Khan M, Maqbool HF. Identifying 5-methylcytosine sites in RNA sequence using composite encoding feature into Chou’s PseKNC. J Theor Biol 2018; 452: 1-9.
[http://dx.doi.org/10.1016/j.jtbi.2018.04.037] [PMID: 29727634]
[59]
Akbar S, Hayat M, Tahir M. cACP-2LFS: Classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach, IEEE Access 8: 131939-48.
[60]
Ali F, Arif M, Khan ZU, Kabir M, Ahmed S, Yu DJ. SDBP-Pred: Prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM. Anal Biochem 2020; 589: 113494.
[http://dx.doi.org/10.1016/j.ab.2019.113494] [PMID: 31693872]
[61]
Akbar S, Hayat M. iMethyl-STTNC: Identification of N6-methyladenosine sites by extending the idea of SAAC into Chou’s PseAAC to formulate RNA sequences. J Theor Biol 2018; 455: 205-11.
[http://dx.doi.org/10.1016/j.jtbi.2018.07.018] [PMID: 30031793]
[62]
Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016;. 785-94.
[http://dx.doi.org/10.1145/2939672.2939785]
[63]
Banjar A, Ali F, Alghushairy O. iDBP-PBMD: A machine learning model for detection of DNA-binding proteins by extending compression techniques into evolutionary profile. Chemom Intell Lab Syst 2022; 231: 104697.
[64]
Basith S, Lee G, Manavalan B. STALLION: A stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction. Brief Bioinform 2022; 23(1): bbab376.
[http://dx.doi.org/10.1093/bib/bbab376] [PMID: 34532736]
[65]
Jeon H, Oh S. Hybrid-recursive feature elimination for efficient feature selection. Applied Sciences-Basel 2020; 10: p. (9)3211.
[66]
Malik A, Shoombuatong W, Kim CB, Manavalan B. GPApred: The first computational predictor for identifying proteins with LPXTG-like motif using sequence-based optimal features. Int J Biol Macromol 2023; 229: 529-38.
[http://dx.doi.org/10.1016/j.ijbiomac.2022.12.315] [PMID: 36596370]
[67]
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002; 46(1/3): 389-422.
[http://dx.doi.org/10.1023/A:1012487302797]
[68]
Zhang Z, Gong Y, Gao B, et al. SNAREs-SAP: SNARE proteins identification with PSSM profiles. Front Genet 2021; 12: 809001.
[http://dx.doi.org/10.3389/fgene.2021.809001] [PMID: 34987554]
[69]
Orzechowska-Licari EJ, LaComb JF, Mojumdar A, Bialkowska AB. SP and KLF transcription factors in cancer metabolism. Int J Mol Sci 2022; 23(17): 9956.
[http://dx.doi.org/10.3390/ijms23179956] [PMID: 36077352]
[70]
Zhong Z, Zhou F, Wang D, et al. Expression of KLF9 in pancreatic cancer and its effects on the invasion, migration, apoptosis, cell cycle distribution, and proliferation of pancreatic cancer cell lines. Oncol Rep 2018; 40(6): 3852-60.
[http://dx.doi.org/10.3892/or.2018.6760] [PMID: 30542730]
[71]
Liao X, Haldar SM, Lu Y, et al. Krüppel-like factor 4 regulates pressure-induced cardiac hypertrophy. J Mol Cell Cardiol 2010; 49(2): 334-8.
[http://dx.doi.org/10.1016/j.yjmcc.2010.04.008] [PMID: 20433848]
[72]
Xie W, Li L, Zheng XL, Yin WD, Tang CK. The role of Krüppel-like factor 14 in the pathogenesis of atherosclerosis. Atherosclerosis 2017; 263: 352-60.
[http://dx.doi.org/10.1016/j.atherosclerosis.2017.06.011] [PMID: 28641818]
[73]
Birsoy K, Chen Z, Friedman J. Transcriptional regulation of adipogenesis by KLF4. Cell Metab 2008; 7(4): 339-47.
[http://dx.doi.org/10.1016/j.cmet.2008.02.001] [PMID: 18396140]
[74]
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B. Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 2019; 116(44): 22071-80.
[http://dx.doi.org/10.1073/pnas.1900654116] [PMID: 31619572]
[75]
Muggleton S, King RD, Stenberg MJE. Protein secondary structure prediction using logic-based machine learning. Protein Eng Des Sel 1992; 5(7): 647-57.
[http://dx.doi.org/10.1093/protein/5.7.647] [PMID: 1480619]
[76]
Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596(7873): 583-9.
[http://dx.doi.org/10.1038/s41586-021-03819-2] [PMID: 34265844]
[77]
Malik A, Ahmad S. Sequence and structural features of carbohydrate binding in proteins and assessment of predictability using a neural network. BMC Struct Biol 2007; 7(1): 1.
[http://dx.doi.org/10.1186/1472-6807-7-1] [PMID: 17201922]
[78]
Firoz A, Malik A, Joplin KH, Ahmad Z, Jha V, Ahmad S. Residue propensities, discrimination and binding site prediction of adenine and guanine phosphates. BMC Biochem 2011; 12(1): 20.
[http://dx.doi.org/10.1186/1471-2091-12-20] [PMID: 21569447]
[79]
Ahmad S, Sarai A. Moment-based prediction of DNA-binding proteins. J Mol Biol 2004; 341(1): 65-71.
[http://dx.doi.org/10.1016/j.jmb.2004.05.058] [PMID: 15312763]
[80]
Manavalan B, Patra MC. MLCPP 2.0: An updated cell-penetrating peptides and their uptake efficiency predictor. J Mol Biol 2022; 434(11): 167604.
[http://dx.doi.org/10.1016/j.jmb.2022.167604] [PMID: 35662468]
[81]
Kurata H, Tsukiyama S, Manavalan B. iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform 2022; 23(4): bbac265.
[http://dx.doi.org/10.1093/bib/bbac265] [PMID: 35772910]
[82]
Wang YH, Zhang YF, Zhang Y, et al. Identification of adaptor proteins using the ANOVA feature selection technique. Methods 2022; 208: 42-7.
[http://dx.doi.org/10.1016/j.ymeth.2022.10.008] [PMID: 36341922]
[83]
Dao FY, Liu ML, Su W, et al. AcrPred: A hybrid optimization with enumerated machine learning algorithm to predict Anti-CRISPR proteins. Int J Biol Macromol 2023; 228: 706-14.
[http://dx.doi.org/10.1016/j.ijbiomac.2022.12.250] [PMID: 36584777]
[84]
Manavalan B, Shin TH, Kim MO, Lee G. PIP-EL: A new ensemble learning method for improved proinflammatory peptide predictions. Front Immunol 2018; 9: 1783.
[http://dx.doi.org/10.3389/fimmu.2018.01783] [PMID: 30108593]
[85]
Manavalan B, Govindaraj RG, Shin TH, Kim MO, Lee G. iBCE-EL: A new ensemble learning framework for improved linear B-Cell epitope prediction. Front Immunol 2018; 9: 1695.
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904]
[86]
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-57.
[http://dx.doi.org/10.1613/jair.953]
[87]
Manavalan B, Shin TH, Lee G. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol 2018; 9: 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
[88]
Qiu WR, Xu A, Xu ZC, Zhang CH, Xiao X. Identifying acetylation protein by fusing its PseAAC and functional domain annotation. Front Bioeng Biotechnol 2019; 7: 311.
[http://dx.doi.org/10.3389/fbioe.2019.00311] [PMID: 31867311]
[89]
Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: Identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 2016; 7(32): 51270-83.
[http://dx.doi.org/10.18632/oncotarget.9987] [PMID: 27323404]