Predicting Protein Structural Class for Low-Similarity Sequences via Novel Evolutionary Modes of PseAAC and Recursive Feature Elimination

Page: [673 - 683] Pages: 11

  • * (Excluding Mailing and Handling)

Abstract

Background and Objective: Protein structural class prediction is a first and key step in protein structure prediction and has become an active research area in biochemistry and bioinformatics. An important aspect for this prediction task is exploring good feature representation. Prior works have demonstrated the effectiveness of the PSI-BLAST profile based feature extraction methods especially for low-similarity protein sequences. However, the prediction accuracies still remain limited. This highlights the need for keeping on exploring the potential of evolutionary information.

Method: In this study, three novel sequence evolutionary modes of pseudo amino acid composition (PseAAC) are proposed and optimized by a two-stage feature selection process based on recursive feature elimination strategy. The selected top-ranking features are then fed into a linear kernel support vector machine classifier to predict the protein structure class. To evaluate the performance of the proposed method, jackknife tests are performed on three widely used low-similarity benchmark datasets (25PDB, 1189 and 640).

Results: With comprehensive comparison with the current state-of-the-art methods, the proposed method achieves superior performance. The overall accuracies on 25PDB, 1189 and 640 datasets are 96.2%, 97.9% and 99.5%, which are 1.9%, 1.5% and 2.3% higher than previous best-performing method.

Conclusion: The satisfactory prediction accuracies achieved by the proposed method are attributed to the specially designed sequence evolutionary modes of PseAAC and the effective feature selection strategy, which cover more discriminative sequence order information. It is anticipated that our method would be helpful in other prediction problems in protein research.

Keywords: Feature selection, position specific score matrix, protein structural class, recursive feature elimination, sequence similarity, support vector machine.

Graphical Abstract