An Integrated-OFFT Model for the Prediction of Protein Secondary Structure Class

Page: [45 - 54] Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Background: Proteins are the utmost multi-purpose macromolecules, which play a crucial function in many aspects of biological processes. For a long time, sequence arrangement of amino acid has been utilized for the prediction of protein secondary structure. Besides, in major methods for the prediction of protein secondary structure class, the impact of Gaussian noise on sequence representation of amino acids has not been considered until now; which is one of the important constraints for the functionality of a protein.

Methods: In the present research, the prediction of protein secondary structure class was accomplished by integrated application of Stockwell transformation and Amino Acid Composition (AAC), on equivalent Electron-ion Interaction Potential (EIIP) representation of raw amino acid sequence. The introduced method was evaluated by using 4 benchmark datasets of low sequence homology, namely PDB25, 498, 277, and 204. Furthermore, random forest algorithm together with the out-of-bag error estimate and Support Vector Machine (SVM), using k-fold cross validation demonstrated high feature representation potential of our reported approach.

Results: The overall prediction accuracy for PDB25, 498, 277, and 204 datasets with randomforest classifier was 92.5%, 94.79%, 92.45%, and 88.04% respectively, whereas with SVM, the results were 84.66%, 95.32%, 89.29%, and 84.37% respectively.

Conclusion: An integrated-order-function-frequency-time (OFFT) model has been proposed for the prediction of protein secondary structure class. For the first time, we reported the effect of Gaussian noise on the prediction accuracy of protein secondary structure class and proposed a robust integrated- OFFT model, which is effectively noise resistant.

Keywords: Protein, secondary structure prediction class, gaussian noise, computational biology, bioinformatics, SVM.

Graphical Abstract

[1]
Vinodhini, R.; Vijaya, M.S. Label sequence learning based protein secondary structure prediction using hydrophobicity scales. in proceedings of the international conference on soft computing for problem solving (SocProS 2011), Springer, India. December 20-22, 2011-2012; pp. 611-622.
[2]
Levitt, M.; Chothia, C. Structural patterns in globular proteins. Nature, 1976, 261(5561), 552-558.
[3]
Marks, D.S.; Hopf, T.A.; Sander, C. Protein structure prediction from sequence variation. Nat. Biotechnol., 2012, 30(11), 1072-1080.
[4]
Nakashima, H.; Nishikawa, K.; Ooi, T. The folding type of a protein is relevant to the amino acid composition. J. Biochem., 1986, 99(1), 153-162.
[5]
Chou, K.C. A novel approach to predicting protein structural classes in a (20–1)-D amino acid composition space. Proteins: Struct. Func. Bioinform., 1995, 21(4), 319-344.
[6]
Garza-Fabre, M.; Rodriguez-Tello, E.; Toscano-Pulido, G. Constraint-handling through multi-objective optimization: The hydrophobic-polar model for protein structure prediction. Comput. Oper. Res., 2015, 53, 128-153.
[7]
Bu, W.S.; Feng, Z.P.; Zhang, Z.; Zhang, C.T. Prediction of protein(domain) structural classes based on amino‐acid index. The FEBS J., 1999, 266(3), 1043-1049.
[8]
Ding, S.; Zhang, S.; Li, Y.; Wang, T. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie, 2012, 94(5), 1166-1171.
[9]
Gordon, G.A. Extrinsic electromagnetic fields, low frequency (phonon) vibrations, and control of cell function: A non-linear resonance system. J. Biomed. Sci. Eng., 2008, 1(3), 152.
[10]
Madkan, A.; Blank, M.; Elson, E.; Chou, K.C.; Geddis, M.S.; Goodman, R. Steps to the clinic with ELF EMF. Nat. Sci., 2009, 1(3), 157.
[11]
Kurgan, L.A.; Homaeian, L. Prediction of structural classes for protein sequences and domains-impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Patt. Rec., 2006, 39(12), 2323-2343.
[12]
Zhou, G.P. An intriguing controversy over protein structural class prediction. J. Protein Chem., 1998, 17(8), 729-738.
[13]
Chou, K.C. A key driving force in determination of protein structural classes. Biochem. Biophys. Res. Commun., 1999, 264(1), 216-224.
[14]
Sahu, S.S.; Panda, G. A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput. Biol. Chem., 2010, 34(5), 320-327.
[15]
Tanford, C. Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J. Am. Chem. Soc., 1962, 84(22), 4240-4247.
[16]
Hopp, T.P.; Woods, K.R. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. , 1981, 78(6), 3824-3828.
[17]
Veljkovic, V.; Cosic, I.; Lalovic, D. Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans. Biomed. Eng., 1985, 5, 337-341.
[18]
Stockwell, R.G.; Mansinha, L.; Lowe, R.P. Localization of the complex spectrum: The S transform. IEEE Trans. Signal Process., 1996, 44(4), 998-1001.
[19]
Sejdić, E.; Djurović, I.; Jiang, J. Time-frequency feature representation using energy concentration: An overview of recent advances. Dig. Sig. Proc., 2009, 19(1), 153-183.
[20]
Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32.
[21]
Vapnik, V. Statistical learning theory; Wiley: New York, 1998.
[22]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 2011, 12, 2825-2830.
[23]
Bylander, T. Estimating generalization error on two-class datasets using out-of-bag estimates. Mach. Learn., 2002, 48(1-3), 287-297.
[24]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Inte. Sys. Technol., (TIST), 2011, 2(3), 27.
[25]
Zhang, S.; Ding, S.; Wang, T. High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie, 2011, 93(4), 710-714.