Deep Learning Model for Protein Disease Classification

Page: [245 - 253] Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Background: Protein sequence analysis helps in the prediction of protein functions. As the number of proteins increases, it gives the bioinformaticians a challenge to analyze and study the similarity between them. Most of the existing protein analysis methods use Support Vector Machine. Deep learning did not receive much attention regarding protein analysis as it is noted that little work focused on studying the protein diseases classification.

Objective: The contribution of this paper is to present a deep learning approach that classifies protein diseases based on protein descriptors.

Methods: Different protein descriptors are used and decomposed into modified feature descriptors. Uniquely, we introduce using the Convolutional Neural Network model to learn and classify protein diseases. The modified feature descriptors are fed to the Convolutional Neural Network model on a dataset of 1563 protein sequences classified into 3 different disease classes: AIDS, Tumor suppressor, and Proto-oncogene.

Results: The usage of the modified feature descriptors shows a significant increase in the performance of the Convolutional Neural Network model over Support Vector Machine using different kernel functions. One modified feature descriptor improved by 19.8%, 27.9%, 17.6%, 21.5%, 17.3%, and 22% for evaluation metrics: Area Under the Curve, Matthews Correlation Coefficient, Accuracy, F1-score, Recall, and Precision, respectively.

Conclusion: Results show that the prediction of the proposed CNN model trained by modified feature descriptors significantly surpasses that of Support Vector Machine model.

Keywords: Protein prediction, disease classification, CNN, EMD, IMF, amino acids.

Graphical Abstract

[1]
Gupta CLP, Bihari A, Tripathi S. Protein classification using machine learning and statistical techniques: A comparative analysis. Recent Adv Comput Sci Commun 2019; 14(5): 16161-32.
[2]
Yang L, Wei P, Zhong C, Meng Z, Wang P, Tang YY. A Fractal dimension and empirical mode decomposition-based method for protein sequence analysis. Int J Pattern Recognit Artif Intell 2019; 33(11): 19400202.
[http://dx.doi.org/10.1142/S0218001419400202]
[3]
Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different computational methods for protein remote homolo-gy detection. Brief Bioinform 2018; 19(2): 231-44.
[http://dx.doi.org/10.1093/bib/bbw108 ] [PMID: 27881430]
[4]
Acharya UR, Fujita H, Oh SL, Hagiwara Y, Tan JH, Adam M. Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf Sci 2017; 415-416: 190-8.
[http://dx.doi.org/10.1016/j.ins.2017.06.027]
[5]
Liao B, Liao B, Lu X, Cao Z. A novel graphical representation of protein sequences and its application. J Comput Chem 2011; 32(12): 2539-44.
[http://dx.doi.org/10.1002/jcc.21833 ] [PMID: 21638292]
[6]
Yu L, Zhang Y, Gutman I, Shi Y, Dehmer M. Protein sequence comparison based on physicochemical properties and the position-feature energy Matrix. Sci Rep 2017; 7: 46237.
[http://dx.doi.org/10.1038/srep46237 ] [PMID: 28393857]
[7]
Wu C, Gao R, De Marinis Y, Zhang Y. A novel model for protein sequence similarity analysis based on spectral radius. J Theor Biol 2018; 446: 61-70.
[http://dx.doi.org/10.1016/j.jtbi.2018.03.001 ] [PMID: 29524440]
[8]
Manavalan B, Shin TH, Lee G. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Front Microbiol 2018; 9: 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476 ] [PMID: 29616000]
[9]
Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: A deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2018; 34(15): 2605-13.
[http://dx.doi.org/10.1093/bioinformatics/bty166 ] [PMID: 29554211]
[10]
Hasan NI, Bhattacharjee A. Deep learning approach to cardiovascular disease classification employing modified ECG signal from empiri-cal mode decomposition. Biomed Signal Process Control 2019; 52: 128-40.
[http://dx.doi.org/10.1016/j.bspc.2019.04.005]
[11]
Uniprot. Available from: https://uniprot.org
[12]
American Cancer Society team.Oncogenes and tumor suppressor genes. USA: American Cancer Society Inc. 2014.
[13]
CDC. HIV Basics. Available from: https://www.cdc.gov/hiv/basics/whatishiv.html
[14]
Smialowski P, Doose G, Torkler P, Kaufmann S, Frishman D. PROSO II--a new method for protein solubility prediction. FEBS J 2012; 279(12): 2192-200.
[http://dx.doi.org/10.1111/j.1742-4658.2012.08603.x ] [PMID: 22536855]
[15]
Liu L. Combining sequence and network information to enhance protein-protein interaction prediction. BMC Bioinform 2020; 21(16): 537.
[16]
Zhou G, Wang J, Zhang X, Guo M, Yu G. Predicting functions of maize proteins using graph convolutional network. BMC Bioinform 2020; 21(16): 420.
[17]
Zhang S, Duan X. Prediction of protein subcellular localization with oversampling approach and Chou’s general PseAAC. J Theor Biol 2018; 437: 239-50.
[http://dx.doi.org/10.1016/j.jtbi.2017.10.030 ] [PMID: 29100918]
[18]
Chen Z, Zhao P, Li F, et al. iFeature: A Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 2018; 34(14): 2499-502.
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]