Abstract
Background: The prediction of a protein's secondary structure from its amino acid sequence
is an essential step towards predicting its 3-D structure. The prediction performance improves
by incorporating homologous multiple sequence alignment information. Since homologous
details not available for all proteins. Therefore, it is necessary to predict the protein secondary structure
from single sequences.
Objective and Methods: Protein secondary structure predicted from their primary sequences using
n-gram word embedding and deep recurrent neural network. Protein secondary structure depends
on local and long-range neighbor residues in primary sequences. In the proposed work, the local
contextual information of amino acid residues captures variable-length character n-gram words. An
embedding vector represents these variable-length character n-gram words. Further, the bidirectional
long short-term memory (Bi-LSTM) model is used to capture the long-range contexts by extracting
the past and future residues information in primary sequences.
Results: The proposed model evaluates on three public datasets ss.txt, RS126, and CASP9. The
model shows the Q3 accuracy of 92.57%, 86.48%, and 89.66% for ss.txt, RS126, and CASP9.
Conclusion: The proposed model performance compares with state-of-the-art methods available in
the literature. After a comparative analysis, it observed that the proposed model performs better
than state-of-the-art methods.
Keywords:
Proteomics, protein secondary structure, amino acids sequence, character n-gram embedding, deep learning, bidirectional
long short-term memory.
Graphical Abstract
[15]
Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A.-R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.; Kingsbury, B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2021, 29(6), 82-97.
[17]
Nogueira, C.; Santos, D.; Gatti, M. In: Deep convolutional neural networks for sentiment analysis of short texts. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, August 2014, Dublin City University and Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 69-78.
[18]
Busia, A.; Collins, J.; Jaitly, N. Protein secondary structure prediction using deep multi-scale convolutional neural networks and next-step conditioning. arXiv:1611.01503.
[19]
Lin, Z.; Lanchantin, J.; Qi, Y. MUST-CNN: a multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction. arXiv:1605.03004.
[21]
Sønderby, S.K.; Winther, O. Protein secondary structure prediction with long short term memory networks. arXiv, 2014.
[23]
Zhou, J.; Troyanskaya, O.G. Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. Proceedings of the 31st International Conference on Machine Learning, PMLR, 2014, 32(1), 745-753.
[24]
Li, Z.; Yu, Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks. arXiv, 2016, 1604.07176.
[33]
Hinton, G.; Srivastava, N.; Swersky, K. Neural Networks for Machine Learning. Lecture 6a: Overview of mini-batch gradient descent. Available from: http://www.cs.toronto.edu/~bonner/ courses/2016s/csc321/lectures/lec6.pdf
[37]
Fang, C.; Shang, Y.; Xu, D. MUFOLD-SS: new deep inception-inside-inception networks for protein secondary structure prediction. Proteins, 2018, 86(5), 592-598.