Current Bioinformatics

Author(s): Qian Cao, Xufeng Xiao, Yannan Bin, Jianping Zhao* and Chunhou Zheng

DOI: 10.2174/0115748936330198240924110742

DownloadDownload PDF Flyer Cite As
PredPVP: A Stacking Model for Predicting Phage Virion Proteins Based on Feature Selection Methods
  • * (Excluding Mailing and Handling)

Abstract

Background: Phage therapy has a broad application prospect as a novel therapeutic method, and Phage Virion Proteins (PVP) can recognize the host and bind to surface receptors, which is of great significance for the development of antimicrobial drugs for the treatment of infectious diseases caused by bacteria. In recent years, several PVP predictors based on machine learning have been developed, which usually use a single feature to train the learner. In contrast, higher dimensional feature representations tend to contain more potential sequence information.

Methods: In this work, we construct a stacking model PredPVP for PVP prediction by combining multiple features and using feature selection methods. Specifically, the sequence is first encoded using seven features. For this high-dimensional feature representation, three feature selection methods wereutilized to remove redundant features, then integrated with eight machine learning algorithms. Finally, probability features and class features (PCFs) generated by 24 base models were put into logistic regression (LR) to train the model.

Results: The results of the independent test set indicate that PredPVP has higher performance compared to other existing predictors, with an AUC of 93.4%.

Conclusion: We expect PredPVP to be used as a tool for large-scale PVP recognition, providing a new way for the development of novel antimicrobials and accelerating its application in actual treatment. The datasets and source codes used in this study are available at https://github.com/caoqian23/PredPVP.

Keywords: Phage virion protein, Feature selection, Stacking learning, Machine learning.