Antioxidant Proteins’ Identification Based on Support Vector Machine

Yuanke	      Xu; Yaping	      Wen; Guosheng	      Han

Abstract

Background: Evidence have increasingly indicated that for human disease, cell metabolism are deeply associated with proteins. Structural mutations and dysregulations of these proteins contribute to the development of the complex disease. Free radicals are unstable molecules that seek for electrons from the surrounding atoms for stability. Once a free radical binds to an atom in the body, a chain reaction occurs, which causes damage to cells and DNA. An antioxidant protein is a substance that protects cells from free radical damage. Accurate identification of antioxidant proteins is important for understanding their role in delaying aging and preventing and treating related diseases. Therefore, computational methods to identify antioxidant proteins have become an effective prior-pinpointing approach to experimental verification.

Methods: In this study, support vector machines was used to identify antioxidant proteins, using amino acid compositions and 9-gap dipeptide compositions as feature extraction, and feature reduction by Principal Component Analysis.

Results: The prediction accuracy Acc of this experiment reached 98.38%, the recall rate Sn of the positive sample was found to be 99.27%, the recall rate Sp of the negative sample reached 97.54%, and the MCC value was 0.9678. To evaluate our proposed method, the predictive performance of 20 antioxidant proteins from the National Center for Biotechnology Information(NCBI) was studied. As a result, 20 antioxidant proteins were correctly predicted by our method. Experimental results demonstrate that the performance of our method is better than the state-of-the-art methods for identification of antioxidant proteins.

Conclusion: We collected experimental protein data from Uniport, including 253 antioxidant proteins and 1552 non-antioxidant proteins. The optimal feature extraction used in this paper is composed of amino acid composition and 9-gap dipeptide. The protein is identified by support vector machine, and the model evaluation index is obtained based on 5-fold cross-validation. Compared with the existing classification model, it is further explained that the SVM recognition model constructed in this paper is helpful for the recognition of antioxidized proteins.

Keywords: 9-gap dipeptide, antioxidant proteins, non-antioxidant proteins, principal component analysis, SVM, 5-fold crossvalidation.

[1] 
Roberts, P.J. Human genome project. Ann. Chir. Gynaecol.,  2001, 90(1), 3.
[PMID: 11336366] 
[2] 
Woychik, R.P.; Klebig, M.L.; Justice, M.J.; Magnuson, T.R.; Avner, E.D. Functional genomics in the post-genome era. Mutat. Res.,  1998, 400(1-2), 3-14.
[http://dx.doi.org/10.1016/S0027-5107(98)00023-2] [PMID: 9685569] 
[3] 
Pandey, A.; Mann, M. Proteomics to study genes and genomes. Nature,  2000, 405(6788), 837-846.
[http://dx.doi.org/10.1038/35015709] [PMID: 10866210] 
[4] 
Uhlén, M.; Fagerberg, L.; Hallström, B.M.; Lindskog, C.; Oksvold, P.; Mardinoglu, A.; Sivertsson, Å.; Kampf, C.; Sjöstedt, E.; Asplund, A.; Olsson, I.; Edlund, K.; Lundberg, E.; Navani, S.; Szigyarto, C.A.; Odeberg, J.; Djureinovic, D.; Takanen, J.O.; Hober, S.; Alm, T.; Edqvist, P-H.; Berling, H.; Tegel, H.; Mulder, J.; Rockberg, J.; Nilsson, P.; Schwenk, J.M.; Hamsten, M.; von Feilitzen, K.; Forsberg, M.; Persson, L.; Johansson, F.; Zwahlen, M.; von Heijne, G.; Nielsen, J.; Pontén, F. Tissue-based map of the human proteome. Science,  2015, 347(6220), 1260419
[http://dx.doi.org/10.1126/science.1260419] [PMID: 25613900] 
[5] 
Lobo, V.; Patil, A.; Phatak, A.; Chandra, N. Free radicals, antioxidants and functional foods: Impact on human health. Pharmacogn. Rev.,  2010, 4(8), 118-126.
[http://dx.doi.org/10.4103/0973-7847.70902] [PMID: 22228951] 
[6] 
Feng, P.; Chen, W.; Lin, H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip. Sci.,  2016, 8(2), 186-191.
[http://dx.doi.org/10.1007/s12539-015-0124-9] [PMID: 26345449] 
[7] 
Mecocci, P.; Polidori, M.C.; Troiano, L.; Cherubini, A.; Cecchetti, R.; Pini, G.; Straatman, M.; Monti, D.; Stahl, W.; Sies, H.; Franceschi, C.; Senin, U. Plasma antioxidants and longevity: a study on healthy centenarians. Free Radic. Biol. Med.,  2000, 28(8), 1243-1248.
[http://dx.doi.org/10.1016/S0891-5849(00)00246-X] [PMID: 10889454] 
[8] 
Service, R.F. Proteomics. High-speed biologists search for gold in proteins. Science,  2001, 294(5549), 2074-2077.
[http://dx.doi.org/10.1126/science.294.5549.2074] [PMID: 11739930] 
[9] 
Feng, Z.P.; Zhang, C.T. A graphic representation of protein sequence and predicting the subcellular locations of prokaryotic proteins. Int. J. Biochem. Cell Biol.,  2002, 34(3), 298-307.
[http://dx.doi.org/10.1016/S1357-2725(01)00121-2] 
[10] 
Zhang, L.; Liao, B.; Li, D.; Zhu, W. A novel representation for apoptosis protein subcellular localization prediction using support vector machine. J. Theor. Biol.,  2009, 259(2), 361-365.
[http://dx.doi.org/10.1016/j.jtbi.2009.03.025] [PMID: 19328812] 
[11] 
Chen, S.A.; Ou, Y.Y.; Lee, T.Y.; Gromiha, M.M. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. Bioinformatics,  2011, 27(15), 2062-2067.
[http://dx.doi.org/10.1093/bioinformatics/btr340] [PMID: 21653515] 
[12] 
Fernández-Blanco, E.; Aguiar-Pulido, V.; Munteanu, C.R.; Dorado, J. Random forest classification based on star graph topological indices for antioxidant proteins. J. Theoretical Biol.,  2013, 317(none), 331-337.
[13] 
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol.,  2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420] 
[14] 
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med.,  2013, 2013, 567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796] 
[15] 
Xu, L.; Liang, G.; Shi, S.; Liao, C. SeqSVM: A sequence-based support vector machine method for identifying antioxidant proteins. Int. J. Mol. Sci.,  2018, 19(6), 1773.
[http://dx.doi.org/10.3390/ijms19061773] [PMID: 29914044] 
[16] 
Meng, C.; Jin, S.; Wang, L.; Guo, F.; Zou, Q. AOPs-SVM: A sequence-based classifier of antioxidant proteins using a support vector machine. Front. Bioeng. Biotechnol.,  2019, 7, 224.
[http://dx.doi.org/10.3389/fbioe.2019.00224] [PMID: 31620433] 
[17] 
Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics,  2012, 28(23), 3150-3152.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610] 
[18] 
Blagus, R.; Lusa, L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics,  2013, 14(1), 106.
[http://dx.doi.org/10.1186/1471-2105-14-106] [PMID: 23522326] 
[19] 
Ding, C.; Yuan, L.F.; Guo, S.H.; Lin, H.; Chen, W. Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J. Proteomics,  2012, 77(24), 321-328.
[http://dx.doi.org/10.1016/j.jprot.2012.09.006] [PMID: 23000219] 
[20] 
Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res.,  2013, 41(6), e68-e68.
[http://dx.doi.org/10.1093/nar/gks1450] [PMID: 23303794] 
[21] 
Chen, W.; Feng, P.; Lin, H. Prediction of replication origins by calculating DNA structural properties. FEBS Lett.,  2012, 586(6), 934-938.
[http://dx.doi.org/10.1016/j.febslet.2012.02.034] [PMID: 22449982] 
[22] 
Jolliffe, I.T. Principal Component Analysis.In: Springer Series in Statistics. Springet-Verlag, New York . , 2002. 

Cite As

Combinatorial Chemistry & High Throughput Screening

Antioxidant Proteins’ Identification Based on Support Vector Machine

Abstract