Protein Secondary Structure Prediction Using Character Bi-gram Embedding and Bi-LSTM

Page: [333 - 338] Pages: 6

  • * (Excluding Mailing and Handling)

Abstract

Background: Protein secondary structure is vital to predicting the tertiary structure, which is essential in deciding protein function and drug designing. Therefore, there is a high requirement of computational methods to predict secondary structure from their primary sequence. Protein primary sequences represented as a linear combination of twenty amino acid characters and contain the contextual information for secondary structure prediction.

Objective and Methods: Protein secondary structure predicted from their primary sequences using a deep recurrent neural network. Protein secondary structure depends on local and long-range residues in primary sequences. In the proposed work, the local contextual information of amino acid residues captures with character n-gram. A dense embedding vector represents this local contextual information. Furthermore, the bidirectional long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting the past and future residues information in primary sequences.

Results: The proposed deep recurrent architecture is evaluated for its efficacy for datasets, namely ss.txt, RS126, and CASP9. The model shows the Q3 accuracies of 88.45%, 83.48%, and 86.69% for ss.txt, RS126, and CASP9, respectively. The performance of the proposed model is also compared with other state-of-the-art methods available in the literature.

Conclusion: After a comparative analysis, it was observed that the proposed model is performing better in comparison to state-of-art methods.

Keywords: Proteomics, protein secondary structure, amino acids sequence, character n-gram embedding, deep learning, bidirectional long short-term memory.

Graphical Abstract

[1]
Ashburner M, Ball CA, Blake JA, et al. The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet 2000; 25(1): 25-9.
[http://dx.doi.org/10.1038/75556] [PMID: 10802651]
[2]
Cole C, Barber JD, Barton GJ. The Jpred 3 secondary structure prediction server .. Nucleic Acids Res 2008; 36(Web Server issue): W197-201.
[http://dx.doi.org/10.1093/nar/gkn238] [PMID: 18463136]
[3]
Yoo P, Zhou B, Zomaya A. Machine Learning Techniques for Protein Secondary Structure Prediction: An Overview and Evaluation. Curr Bioinform 2008; 3(2): 74-86.
[http://dx.doi.org/10.2174/157489308784340676]
[4]
Yang Y, Gao J, Wang J, et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018; 19(3): 482-94.
[http://dx.doi.org/10.1093/bib/bbw129] [PMID: 28040746]
[5]
Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 1958; 181(4610): 662-6.
[http://dx.doi.org/10.1038/181662a0] [PMID: 13517261]
[6]
Hafsa NE, Arndt D, Wishart DS. CSI 3.0: a web server for identifying secondary and super-secondary structure in proteins using NMR chemical shifts. Nucleic Acids Res 2015; 43(W1), W370-7.
[http://dx.doi.org/10.1093/nar/gkv494] [PMID: 25979265]
[7]
Dong A, Huang P, Caughey WS. Protein secondary structures in water from second-derivative amide I infrared spectra. Biochemistry 1990; 29(13): 3303-8.
[http://dx.doi.org/10.1021/bi00465a022] [PMID: 2159334]
[8]
Toomula N, Kumar S, Pavan Kumar V. Computational methods for protein structure prediction and its application in drug design. J Proteomics Bioinform Cit 2011; 4: 289-93.
[http://dx.doi.org/10.4172/jpb.1000203]
[9]
Chou PY, Fasman GD. Prediction of protein conformation. Biochemistry 1974; 13(2): 222-45.
[http://dx.doi.org/10.1021/bi00699a002] [PMID: 4358940]
[10]
Kloczkowski A, Ting KL, Jernigan RL, Garnier J. Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 2002; 49(2): 154-66.
[http://dx.doi.org/10.1002/prot.10181] [PMID: 12210997]
[11]
Hua S, Sun Z. A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 2001; 308(2): 397-407.
[http://dx.doi.org/10.1006/jmbi.2001.4580] [PMID: 11327775]
[12]
Zhou J, Wang H, Zhao Z, Xu R, Lu Q. CNNH_PSS: protein 8-class secondary structure prediction by convolutional neural network with highway. BMC Bioinformatics 2018; 19(Suppl. 4): 60.
[http://dx.doi.org/10.1186/s12859-018-2067-8] [PMID: 29745837]
[13]
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C. Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 2015; 43(W1), W65-71.
[http://dx.doi.org/10.1093/nar/gkv458] [PMID: 25958395]
[14]
Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res 2019; 47(20), e127.
[http://dx.doi.org/10.1093/nar/gkz740] [PMID: 31504851]
[15]
Liu B, Wu H, Zhang D, Wang X, Chou K-C. Pse-Analysis: a python package for DNA/RNA and protein/peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 2017; 8(8): 13338-43.
[http://dx.doi.org/10.18632/oncotarget.14524] [PMID: 28076851]
[16]
Chen Z, Zhao P, Li F, et al. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 2018; 34(14): 2499-502.
[http://dx.doi.org/10.1093/bioinformatics/bty140] [PMID: 29528364]
[17]
Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Proc Mag 2012; 29(6): 82-97.
[18]
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018., 20187068349.
[http://dx.doi.org/10.1155/2018/7068349] [PMID: 29487619]
[19]
Nogueira C, Santos D, Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. 3rd International Conference on Control, Automation and Robotics (ICCAR). IEEE,Nagoya, 2017..
[20]
Busia A, Collins J, Jaitly N. Protein Secondary Structure Prediction Using Deep Multi-scale Convolutional Neural Networks and Next-Step Conditioning nd Available from: http://www.princeton.edu/ (Accessed on January 15, 2020)
[21]
Lin Z, Lanchantin J, Qi Y. MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-Based Protein Structure Prediction nd Available from www.aaai.org (Accessed on January 15, 2020)
[22]
Pollastri G, Przybylski D, Rost B, Baldi P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 2002; 47(2): 228-35.
[http://dx.doi.org/10.1002/prot.10082] [PMID: 11933069]
[23]
Sønderby SK, Winther O. Protein secondary structure prediction 95 with long short term memory networks 2014. arXiv.org > q-bio > 96 arXiv:1412.7828..
[24]
Guo Y, Wang B, Li W, Yang B. Protein secondary structure prediction improved by recurrent neural networks integrated with two-dimensional convolutional neural networks. J Bioinform Comput Biol 2018; 16(5), 1850021.
[http://dx.doi.org/10.1142/S021972001850021X] [PMID: 30419785]
[25]
Zhou J, Troyanskaya OG. Deep supervised and convolutional 103 generative stochastic network for protein secondary structure 104 prediction. 2014; eprint arXiv:1403.1347..
[26]
Li Z, Yu Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks 2016. ; arXiv.org > qbio> arXiv:1604.07176.
[27]
Wang S, Peng J, Ma J, Xu J. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016; 6: 18962.
[http://dx.doi.org/10.1038/srep18962] [PMID: 26752681]
[28]
https://www.rcsb.org/ RCSB PDB: Homepage, n.d. Available from:(Accessed on April 17, 2020).
[29]
Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-round IX. Proteins 2011; 79(Suppl. 10): 1-5.
[http://dx.doi.org/10.1002/prot.23200] [PMID: 21997831]
[30]
Rost B, Sander C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci USA 1993; 90(16): 7558-62.
[http://dx.doi.org/10.1073/pnas.90.16.7558] [PMID: 8356056]
[31]
Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983; 22(12): 2577-637.
[http://dx.doi.org/10.1002/bip.360221211] [PMID: 6667333]
[32]
Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process 1997; 45(11): 2673-81.
[http://dx.doi.org/10.1109/78.650093]
[33]
Home - Keras Documentation, (n.d.). Available from:. https://keras.io/ (Accessed on January 15, 2020)
[34]
TensorFlow White Papers | TensorFlow, (n.d.). Available from:. https://www.tensorflow.org/about/bib (Accessed on January 15, 2020).
[35]
Hinton G, Srivastava N, Swersky K. Neural Networks for Machine Learning Lecture 6a Overview of mini-batch gradient descent. 2012; 14(8): 31..
[36]
Heffernan R, Paliwal K, Lyons J, Singh J, Yang Y, Zhou Y. Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning. J Comput Chem 2018; 39(26): 2210-6.
[http://dx.doi.org/10.1002/jcc.25534] [PMID: 30368831]
[37]
Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res 2015; 43(W1), W389-94.
[http://dx.doi.org/10.1093/nar/gkv332] [PMID: 25883141]
[38]
Wang S, Li W, Liu S, Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res 2016; 44(W1), W430-5.
[http://dx.doi.org/10.1093/nar/gkw306] [PMID: 27112573]
[39]
Fang C, Shang Y, Xu D. MUFOLD-SS: new deep inception-inside-inception networks for protein secondary structure prediction. Proteins 2018; 86(5): 592-8.
[40]
Hu H, Li Z, Elofsson A, Xie S. A Bi-LSTM based ensemble algorithm for prediction of protein secondary structure. Appl Sci 2019; 9: 3538.
[http://dx.doi.org/10.3390/app9173538]
[41]
Aydin Z, Altunbasak Y, Borodovsky M. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics 2006; 7: 178.
[http://dx.doi.org/10.1186/1471-2105-7-178] [PMID: 16571137]
[42]
Rost B, Sander C, Schneider R. PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci 1994; 10(1): 53-60.
[http://dx.doi.org/10.1093/bioinformatics/10.1.53] [PMID: 8193956]
[43]
Magnan CN, Baldi P. SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity. Bioinformatics 2014; 30(18): 2592-7.
[http://dx.doi.org/10.1093/bioinformatics/btu352] [PMID: 24860169]