Abstract
Background: Krüppel-like factors (KLFs) are a family of transcription factors containing zinc
fingers that regulate various cellular processes. KLF proteins are associated with human diseases, such as
cancer, cardiovascular diseases, and metabolic disorders. The KLF family consists of 18 members with
diverse expression profiles across numerous tissues. Accurate identification and annotation of KLF proteins
is crucial, given their involvement in important biological functions. Although experimental approaches
can identify KLF proteins precisely, large-scale identification is complicated, slow, and expensive.
Methods: In this study, we developed RDR100, a novel random forest (RF)-based framework for predicting
KLF proteins based on their primary sequences. First, we identified the optimal encodings for ten
different features using a recursive feature elimination approach, and then trained their respective model
using five distinct machine learning (ML) classifiers.
Results: The performance of all models was assessed using independent datasets, and RDR100 was selected
as the final model based on its consistent performance in cross-validation and independent evaluation.
Conclusion: Our results demonstrate that RDR100 is a robust predictor of KLF proteins. RDR100 web
server is available at https://procarb.org/RDR100/.
Graphical Abstract
[11]
Paranjapye A, NandyMazumdar M, Harris A. Kruppel-like factor 5 regulates CFTR expression through repression by maintaining chromatin architecture coupled with direct enhancer activation. J Mol Biol 2022; 434.
[27]
Bateman A, Martin MJ, Orchard S. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res 2022; 49(D1): D480-9.
[48]
Akkus A, Güvenir HA. K nearest neighbor classification on feature projections. Proceedings of the Thirteenth International Conference on International Conference on Machine Learning 1996;. 12-9.
[49]
Ahmed S, Arif M, Kabir M. PredAoDP: Accurate identification of antioxidant proteins by fusing different descriptors based on evolutionary information with support vector machine. Chemom Intell Lab Syst 2022; 228: 104623.
[50]
Rish I. An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence. 2001; pp. 41-6.
[59]
Akbar S, Hayat M, Tahir M. cACP-2LFS: Classification of anticancer peptides using sequential discriminative model of KSAAP and two-level feature selection approach, IEEE Access 8: 131939-48.
[63]
Banjar A, Ali F, Alghushairy O. iDBP-PBMD: A machine learning model for detection of DNA-binding proteins by extending compression techniques into evolutionary profile. Chemom Intell Lab Syst 2022; 231: 104697.
[65]
Jeon H, Oh S. Hybrid-recursive feature elimination for efficient feature selection. Applied Sciences-Basel 2020; 10: p. (9)3211.