Current Bioinformatics

Author(s): Xin Wang*, Zi Meng Zhang and Chang Liu

DOI: 10.2174/0115748936347236241119045342

DownloadDownload PDF Flyer Cite As
PKE-Ubsite: A Ubiquitylation Site Predictor for Plants Based on Multiple Encoders and Ensemble Deep Learning Framework
  • * (Excluding Mailing and Handling)

Abstract

Introduction: Ubiquitylation, a key post-translational modification [PTM], has significant influences on the structures, activities, and functions of proteins and is linked to various diseases. Traditional experimental identification and characterization methods for identifying ubiquitylation sites [Ubsites] are time-consuming, expensive, and labor-intensive if prior knowledge concerning ubiquitylation is absent. Nevertheless, most methods reported for predictions of Ubsites are based on traditional machine learning. Owing to the increased availability of genomic and proteomic samples, deep learning-based recognition methods for Ubsites are becoming increasingly popular.

Method: In this study, we propose a new feature extraction method, pKcode, based on only seven physicochemical features of amino acids [AAs]. The pKcode captures both the biochemical context and precise sequence locations of AAs around the Ubsites, improving the predictive capability for ubiquitination. We created the pKPAP encoding scheme by integrating the pKcode with PSDAAP, AAC, and PWAA, resulting in an all-encompassing feature representation. Concurrently, we developed the PKE- Ubsite.

Result: Ubsite model, a new ensemble prediction framework, amalgamates the power of classifiers in five pipelines: three bidirectional long short-term memory [BiLSTM] networks, one convolutional neural network [CNN], and one random forest [RF] classifier. Each classifier uses an optimized combination of encoding features, and an integrated classification is achieved through a voting mechanism.

Conclusion: Finally, compared with existing models on an independent test set, our model has an accuracy of 0.8368, an F1-score of 0.8430, a precision of 0.8124, a recall of 0.8760, and an AUC of 0.9103, which are superior to all methods reported to date. Overall, PKE-Ubsite may facilitate a thorough understanding of ubiquitylation.

Keywords: Lysine ubiquitylation, Bi-directional long short-term memory, Protein post-translational modifications, Deep learning, Machine learning, Ensemble classifiers.