Prediction of DNA-binding Sites in Transcriptions Factor in Fur-like Proteins Using Machine Learning and Molecular Descriptors

Page: [398 - 407] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Introduction: Transcription factors are of great interest in biotechnology due to their key role in the regulation of gene expression. One of the most important transcription factors in gramnegative bacteria is Fur, a global regulator studied as a therapeutic target for the design of antibacterial agents. Its DNA-binding domain, which contains a helix-turn-helix motif, is one of its most relevant features.

Methods: In this study, we evaluated several machine learning algorithms for the prediction of DNA-binding sites based on proteins from the Fur superfamily and other helix-turn-helix transcription factors, including Support-Vector Machines (SVM), Random Forest (RF), Decision Trees (DT), and Naive Bayes (NB). We also tested the efficacy of using several molecular descriptors derived from the amino acid sequence and the structure of the protein fragments that bind the DNA. A feature selection procedure was employed to select fewer descriptors in each case by maintaining a good classification performance.

Results: The best results were obtained with the SVM model using twelve sequence-derived attributes and the DT model using nine structure-derived features, achieving 82% and 76% accuracy, respectively.

Conclusion: The performance obtained indicates that the descriptors we used are relevant for predicting DNA-binding sites since they can discriminate between binding and non-binding regions of a protein.

[1]
Deng C, Wu Y, Lv X, et al. Refactoring transcription factors for metabolic engineering. Biotech Adv 2022; 57(August 2021): 107935.
[http://dx.doi.org/10.1016/j.biotechadv.2022.107935]
[2]
Neph S, Vierstra J, Stergachis AB, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 2012; 489(7414): 83-90.
[http://dx.doi.org/10.1038/nature11212] [PMID: 22955618]
[3]
Yu H, Gerstein M. Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Acad Sci USA 2006; 103(40): 14724-31.
[http://dx.doi.org/10.1073/pnas.0508637103] [PMID: 17003135]
[4]
Geng H, Jiang R. cAMP receptor protein (CRP)-mediated resistance/tolerance in bacteria: Mechanism and utilization in biotechnology. Appl Microbiol Biotechnol 2015; 99(11): 4533-43.
[http://dx.doi.org/10.1007/s00253-015-6587-0] [PMID: 25913005]
[5]
Lin Z, Zhang Y, Wang J. Engineering of transcriptional regulators enhances microbial stress tolerance. Biotechnol Adv 2013; 31(6): 986-91.
[http://dx.doi.org/10.1016/j.biotechadv.2013.02.010] [PMID: 23473970]
[6]
Papavassiliou KA, Papavassiliou AG. Transcription factor drug targets. J Cell Biochem 2016; 117(12): 2693-6.
[http://dx.doi.org/10.1002/jcb.25605] [PMID: 27191703]
[7]
Seo SW, Kim D, Latif H, O’Brien EJ, Szubin R, Palsson BO. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli. Nat Commun 2014; 5(1): 4910.
[http://dx.doi.org/10.1038/ncomms5910] [PMID: 25222563]
[8]
Hantke K. Iron and metal regulation in bacteria. Curr Opin Microbiol 2001; 4(2): 172-7.
[http://dx.doi.org/10.1016/S1369-5274(00)00184-3] [PMID: 11282473]
[9]
Pich OQ, Merrell DS. The ferric uptake regulator of Helicobacter pylori: A critical player in the battle for iron and colonization of the stomach. Future Microbiol 2013; 8(6): 725-38.
[http://dx.doi.org/10.2217/fmb.13.43] [PMID: 23701330]
[10]
Pohl E, Haller JC, Mijovilovich A, Meyer-Klaucke W, Garman E, Vasil ML. Architecture of a protein central to iron homeostasis: Crystal structure and spectroscopic analysis of the ferric uptake regulator. Mol Microbiol 2003; 47(4): 903-15.
[http://dx.doi.org/10.1046/j.1365-2958.2003.03337.x] [PMID: 12581348]
[11]
Sritharan M. Iron and bacterial virulence. Indian J Med Microbiol 2006; 24(3): 163-4.
[http://dx.doi.org/10.1016/S0255-0857(21)02343-4] [PMID: 16912433]
[12]
Cissé C, Mathieu SV, Abeih MBO, et al. Inhibition of the ferric uptake regulator by peptides derived from anti-FUR peptide aptamers: Coupled theoretical and experimental approaches. ACS Chem Biol 2014; 9(12): 2779-86.
[http://dx.doi.org/10.1021/cb5005977] [PMID: 25238402]
[13]
Mathieu S, Cissé C, Vitale S, et al. From peptide aptamers to inhibitors of FUR, bacterial transcriptional regulator of iron homeostasis and virulence. ACS Chem Biol 2016; 11(9): 2519-28.
[http://dx.doi.org/10.1021/acschembio.6b00360] [PMID: 27409249]
[14]
He X, Liao X, Li H, Xia W, Sun H. Bismuth-induced inactivation of ferric uptake regulator from helicobacter pylori. Inorg Chem 2017; 56(24): 15041-8.
[http://dx.doi.org/10.1021/acs.inorgchem.7b02380] [PMID: 29200284]
[15]
Zhang Y, Ni J, Gao Y RF‐SVM. Identification of DNA‐binding proteins based on comprehensive feature representation methods and support vector machine Proteins 2022; 90(2): 395-404.
[http://dx.doi.org/10.1002/prot.26229] [PMID: 34455627]
[16]
Hendrix SG, Chang KY, Ryu Z, Xie ZR. Deepdise: Dna binding site prediction using a deep learning method. Int J Mol Sci 2021; 22(11): 5510.
[http://dx.doi.org/10.3390/ijms22115510] [PMID: 34073705]
[17]
Liu B, Xu J, Lan X, et al. iDNA-Prot|dis: Identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014; 9(9): e106691.
[http://dx.doi.org/10.1371/journal.pone.0106691] [PMID: 25184541]
[18]
Sang X, Xiao W, Zheng H, Yang Y, Liu T. HMMPred: Accurate prediction of DNA-binding proteins based on HMM profiles and XGBoost feature selection. Comput Math Methods Med 2020; 2020: 1-10.
[http://dx.doi.org/10.1155/2020/1384749] [PMID: 32300371]
[19]
Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein–DNA binding interfaces. Brief Bioinform 2020; 21(3): 1038-46.
[http://dx.doi.org/10.1093/bib/bbz037] [PMID: 30957840]
[20]
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H. The Protein Data Bank. Nucleic Acids Res 2000; 28(1): 235-42.
[21]
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 2011; 28(10): 2731-9.
[http://dx.doi.org/10.1093/molbev/msr121] [PMID: 21546353]
[22]
Humphrey W. VMD: Visual molecular dynamics. J Mol Graph 1996; 14(1): 33-8.
[23]
Eargle J, Wright D, Luthey-Schulten Z. Multiple Alignment of protein structures and sequences for VMD. Bioinformatics 2006; 22(4): 504-6.
[http://dx.doi.org/10.1093/bioinformatics/bti825] [PMID: 16339280]
[24]
Osorio D, Rondón-Villarreal P, Torres R. Peptides: A package for data mining of antimicrobial peptides. R J 2015; 7(1): 4-14.
[http://dx.doi.org/10.32614/RJ-2015-001]
[25]
Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 2017; 7(1): 42362.
[http://dx.doi.org/10.1038/srep42362] [PMID: 28205576]
[26]
Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: An online force field. Nucleic Acids Res 2005; 33(Web Server) (Suppl. 2): W382-8.
[http://dx.doi.org/10.1093/nar/gki387] [PMID: 15980494]
[27]
Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014; 40(1): 16-28.
[http://dx.doi.org/10.1016/j.compeleceng.2013.11.024]
[28]
Berisha V, Krantsevich C, Hahn PR, et al. Digital medicine and the curse of dimensionality. NPJ Digit Med 2021; 4(1): 153.
[http://dx.doi.org/10.1038/s41746-021-00521-5] [PMID: 34711924]
[29]
Chowdhury SY, Shatabda S, Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Sci Rep 2017; 7(1): 14938.
[http://dx.doi.org/10.1038/s41598-017-14945-1] [PMID: 29097781]
[30]
Lou W, Wang X, Chen F, Chen Y, Jiang B, Zhang H. Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014; 9(1): e86703.
[http://dx.doi.org/10.1371/journal.pone.0086703] [PMID: 24475169]