Current Bioinformatics

Author(s): Xinyi Liao*, Xiaomei Gu* and Dejun Peng

DOI: 10.2174/1574893617666220106112044

DownloadDownload PDF Flyer Cite As
Identification of Plasmodium Secreted Proteins Based on MonoDiKGap and Distance-Based Top-n-Gram Methods

Page: [804 - 813] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Background: Many malarial infections are caused by Plasmodium falciparum. Accurate classification of the proteins secreted by the malaria parasite, which are essential for the development of anti-malarial drugs, is necessary.

Objective: This study aimed at accurately classifying the proteins secreted by the malaria parasite.

Methods: Therefore, in order to improve the accuracy of the prediction of Plasmodium secreted proteins, we established a classification model MGAP-SGD. MonodikGap features (k=7) of the secreted proteins were extracted, and then the optimal features were selected by the AdaBoost method. Finally, based on the optimal set of secreted proteins, the model was used to predict the secreted proteins using the Stochastic Gradient Descent (SGD) algorithm.

Results: We used a 10-fold cross-validation set and independent test set in the stochastic gradient descent (SGD) classifier to validate the model, and the accuracy rates were found to be 98.5859% and 97.973%, respectively.

Conclusion: This study confirms the effectiveness and robustness of the prediction results of the MGAP-SGD model that can meet the prediction requirements of the secreted proteins of Plasmodium.

Keywords: Plasmodium, Top-n-gram, MonoDiKGap, dimensionality reduction, SGD, ANOVA.