Recent Advances in Computer Science and Communications

Author(s): Somayeh Khajehasani and Louiza Dehyadegari*

DOI: 10.2174/2213275912666190411113728

Speech Recognition Using Elman Artificial Neural Network and Linear Predictive Coding

Page: [650 - 656] Pages: 7

  • * (Excluding Mailing and Handling)

Abstract

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers.

Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work.

Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit.

Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors.

Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.

Keywords: Speech recognition, Digit recognition, Linear predictive coding, Recurrent neural networks, Elman neural networks, Machine learning.

Graphical Abstract

[1]
E. L. Eisenstein, The printing press as an agent of change, Cambridge, UK: Cambridge University Press, Amazon. Vol. 1, Jul. 1980, pp. 55-67.
[2]
D. Zhu, S. Nakamura, K.K. Paliwal, and R. Wang, "Maximum likelihood sub-band adaptation for robust speech recognition", Speech Commun., vol. 47, no. 3, pp. 243-264, . Nov. 2005
[3]
S.S. Nidhyananthan, and V. Shenbagalakshmi, "Assessment of dysarthric speech using Elman back propagation network (recurrent network) for speech recognition", Int. J. Speech Technol., vol. 19, no. 3, pp. 577-583, . September 2016
[4]
M.S. Alkhasawneh, and L.T. Tay, "A hybrid intelligent system integrating the cascade forward neural network with Elman neural network", Arab. J. Sci. Eng., vol. 10, no. 1, pp. 1-13, . September 2017
[5]
D. Zhu, S. Nakamura, K. K. Paliwal, and R. Wang, Maximum likelihood sub-band adaptation for robust speech recognition, Speech Commun.. Vol. 47, pp. 243-264, No. 3, November 2005
[6]
P. Somervuo, "Experiments with linear and nonlinear feature trans- formations in HMM based phone recognition", In: Proc. Acoustics, Speech and Signal Processing (ICASSP'03), IEEE International Conference. pp. I-52, vol. 1, April 2003.
[7]
L. Badino, C. Canevari, L. Fadiga, and G. Metta, Deep-level acoustic-to-articulatory mapping foq11r DBN-HMM based phone recognition.in Proc. Spoken Language Technology (SLT)., IEEE, pp. 370-375. Dec 2012
[8]
S. Choi, Independent component analysis, In Encyclopedia of Bio- metrics, Springer Science & Business Media, Vol. 2, No. 1, pp. 917-924, 2015
[9]
P. Howard, D.W. Apley, and G. Runger, "Distinct variation pattern discovery using alternating nonlinear principal component analysis", IEEE Trans. on Neural Netw. Learning Syst., vol. 10, no. 1, pp. 18-24, . 2018
[10]
Y. Mori, M. Kurod, and N. Makino, Nonlinear Principal Component Analysis", Nonlinear Principal Component Analysis and Its Applications, Berlin, Heidelberg: Springer,, vol. 24. no. 1, pp. 7-20. 2016
[11]
F.M. Bianchi, E. Maiorino, M.C. Kampffmeyer, A. Rizzi, and R. Jenssen, Recurrent neural network architectures.Recurrent Neural Networks for Short-Term Load Forecasting.Cham : Springer, . Vol. 43, pp. 23-29., 2017.
[12]
D. Avci, "An expert system for speaker identification using adaptive wavelet sure entropy", Expert Syst. Appl., vol. 36, no. 3, pp. 6295-6300, . 2009
[13]
M.J. Landau, "Acoustical properties of speech as indicators of depression and suicidal risk", Vanderbilt Undergrad. Res. J., vol. 4, no. 1, pp. 24-30, . 2008
[14]
C. Busso, S. Lee, and S. Narayanan, "Analysis of emotionally sali- ent aspects of fundamental frequency for emotion detection", IEEE Transact Audio Speech Lang. Process., vol. 17, no. 4, pp. 582-596, . 2009
[15]
A. Ali, Y. Zhang, P. Cardinal, N. Dahak, S. Voge, and J. Glass, A complete KALDI recipe for building Arabic speech recognition systems.in Proc. Spoken Language Technology (SLT)., IEEE, pp. 525-52. 2014
[16]
G. Fu, "A novel isolated speech recognition method based on neural network", In: Proc. of the International Conference on In- formation Engineering and Applications (IEA), pp. 429-436. 2013
[17]
T. Hain, L. Burget, J. Dines, P. N. Garner, F. Grézl, and A. El Hannani, "Transcribing meetings with the AMIDA systems," IEEE Trans. on Audio Speech and Language Processing, vol. 20, no. 2, pp. 486-498, . Aug 2012
[18]
C.P. Chen, and J.A. Bilmes, "MVA processing of speech features", IEEE Transactions on Audio Speech and Language Processing, vol. 15, no. 1, pp. 257-270, . 2007
[19]
G. Cravotto, and P. Cintas, "Molecular self-assembly and patterning induced by sound waves. The case of gelation", Chem. Soc. Rev., vol. 38, no. 9, pp. 2684-2697, . 2009
[20]
S. Lahmiri, "“A comparative study of backpropagation algorithms in financial prediction”, Int. J. Comp. Sci", Eng. and Appl. [IJCSEA]., vol. 1, no. 4, pp. 15-21, . Aug 2011
[21]
D. Hanchate, M. Nalawade, M. Pawar, V. Pophale, and P.K. Mau-rya, "Vocal digit recognition using artificial neural network", In: Proc. Computer Engineering and Technology (ICCET), 2nd International Conference, pp. V688-V691. 2010
[22]
H.G. Hirsch, and D. Pearce, "The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions", In: Proc. ASR2000-Automatic Speech Recogni- tion: Challenges for the new Millenium ISCA Tutorial and Re- search Workshop (ITRW).Paris, France Sep. 2000
[23]
M. Dong, X. Huang, and B. Xu, "Unsupervised speech recognition through spike-timing-dependent plasticity in a convolutional spik- ing neural network", Plos. pp. 1-19, .vol. 13, 2018.
[24]
D. Neil, J.H. Lee, T. Delbruck, and S.C. Liu, "Delta networks for opti- mized recurrent network computation", In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. pp. 2584-2593. 2017