CORDIC KSVD based Online Dictionary Learning for Speech Enhancement on ASIC/FPGA Platforms

Article ID: e110522204582 Pages: 10

  • * (Excluding Mailing and Handling)

Abstract

Background: The enhancement of real-world speech signals is still a challenging task to eliminate noises, namely reverberation, background, street, and babble noises. Recently learned methods like dictionary learning have become increasingly popular and showed promising results in speech enhancement. The K-means Singular Value Decomposition (KSVD) algorithm is best suited for dictionary learning among many sparse representation algorithms. Moreover, the orthogonal matching pursuit (OMP) based algorithm used for signal recovery is given. The orthogonal matching pursuit (OMP) based algorithm for signal recovery gives the best enhancement results. On the other hand, FPGAs and ASICs are widely used to accelerate speech enhancement applications. FPGAs are commonly used in healthcare and consumer applications, where speech enhancement plays a crucial role.

Methods: This paper proposes a modified KSVD algorithm that can easily be implemented onto hardware platforms like FPGAs and ASICS. Instead of using the double-precision arithmetic for the singular value decomposition part of the KSVD algorithm, we proposed to use CORDIC (Coordinate Rotation Digital Computer) based QR decomposition and QR-based singular value decomposition in dictionary learning.

Results: The proposed KSVD algorithm is optimal with the CORDIC algorithm that can reduce by 7-8 times the processing time.

Conclusion: The finding indicates that the proposed work is best suited to FPGA or ASIC platforms.

Keywords: K-singular value decomposition, FPGA, ASIC, CORDIC, dictionary learning, sparse representation.

Graphical Abstract

[1]
J. Girika, and M.Z. Ur Rahman, "Adaptive speech enhancement techniques for computer based speaker recognition", J. Theor. Appl. Inf. Technol., vol. 95, no. 10, pp. 2214-2223, 2017.
[2]
J. Girika, and M.Z. Ur Rahman, "Sign regressor based normalized adaptive filters for speech enhancement applications", Int. J. Eng. Technol., vol. 7, no. 2, pp. 79-84, 2018.
[http://dx.doi.org/10.14419/ijet.v7i2.17.11563]
[3]
V. Gopi Tilak, and S. Koteswara Rao, "Dual and joint estimation for speech enhancement", Int. J. Eng. Technol., vol. 7, no. 2, pp. 5-8, 2018.
[http://dx.doi.org/10.14419/ijet.v7i2.7.10243]
[4]
J. Girika, and M. Zia Ur Rahman, "Adaptive speech enhancement technique using time variable LMS algorithm", IJITEE, vol. 8, no. 8, pp. 2713-2718, 2019.
[5]
K.N.H. Srinivas, I. Santhi Prabha, and M. Venugopala Rao, "Speech enhancement based on dictionary learning and sparse representation", JARDCS, vol. 11, no. 8, pp. 20-30, 2019.
[6]
R. Rubinstein, M. Zibulevsky, and M. Elad, "Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit", CS Technion, p. 40, 2008.
[7]
M. Aharon, M. Elad, and A. Bruckstein, "K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation", IEEE Trans. Signal Process., vol. 54, no. 11, pp. 4311-4322, 2006.
[http://dx.doi.org/10.1109/TSP.2006.881199]
[8]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, "Online dictionary learning for sparse codin", Proc. Int. Conf. Mach. Learn.(ICML), vol. 382, pp. 689-696, 2009.
[9]
M.G. Jafari, and M.D. Plumbley, "Fast dictionary learning for sparse representations of speech signals", IEEE J. Sel. Top. Signal Process., vol. 5, no. 5, pp. 1025-1031, 2011.
[http://dx.doi.org/10.1109/JSTSP.2011.2157892]
[10]
V.V. Narayana, S.H. Ahammad, B.V. Chandu, G. Rupesh, G.A. Naidu, and G.P. Gopal, "Estimation of quality and intelligibility of a speech signal with varying forms of additive noise", Int. J. Emerg. Trends Eng. Res., vol. 7, no. 11, pp. 430-433, 2019.
[11]
"Mane, and M. Venu Gopal Rao, “Compressive sampling on speech signal using Random Demodulator”", Int. J. Appl. Eng. Res., vol. 10, no. 20, pp. 18995-18998, 2015.
[12]
T.J.V. Subrahmanyeswara Rao, "Sampling of sparse speech signal using random demodulator", Int. J. Appl. Eng. Res., vol. 9, no. 23, pp. 22953-22964, 2014.
[13]
R. Mohanty, G. Anirudh, T. Pradhan, B. Kabi, and A. Routray, "Design and performance analysis of fixed-point jacobi svd algorithm on reconfigurable system", IERI Proc.,, vol. 7, pp. 21-27, 2014.
[http://dx.doi.org/10.1016/j.ieri.2014.08.005]
[14]
H. Rabah, A. Amira, B.K. Mohanty, S. Almaadeed, and P.K. Meher, "FPGA implementation of orthogonal matching pursuit for compressive sensing reconstruction", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 10, pp. 2209-2220, 2015.
[http://dx.doi.org/10.1109/TVLSI.2014.2358716]
[15]
B. Kabi, A.S. Sahadevan, and T. Pradhan, "An overflow free fixed-point eigenvalue decomposition algorithm: Case study of dimensionality reduction in hyperspectral image", arXiv, vol. 2017, pp. 1711-10600, 2017.
[http://dx.doi.org/10.1109/DASIP.2017.8122131]
[16]
Viet-Hang Duong, , ?Manh-Quan Bui,, and Jia-Ching Wang,, ""Dictionary learning-based speech enhancement, active learning - beyond the future"", Sílvio Manuel Brito, 2019.
[http://dx.doi.org/10.5772/intechopen.85308]
[17]
A. Shiri, and G.K. Khosroshahi, "An FPGA implementation of singular value decomposition", 27th Iranian Conference on Electrical Engineering (ICEE), pp. 416-422, 2019.
Yazd, Iran. [http://dx.doi.org/10.1109/IranianCEE.2019.8786719]
[18]
K.N.H. Srinivas, I. Sathi Prabha, and M. Venu Gopala Rao, "Speech enhancement based on offline dictionary learning and fixed-point recovery", IJAST, vol. 29, no. 05, pp. 11498-11509, 2020.
[19]
E. Jack, "The CORDIC trigonometric computing technique", IRE Trans. Electron. Comput., vol. 8, no. 3, p. EC-08, 1959.
[20]
M. Parker, V. Mauer, and D. Pritsker, "QR decomposition using FPGAs", In IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), 2016, pp. 416-421
[http://dx.doi.org/10.1109/NAECON.2016.7856841]
[21]
C. Liu, C. Tang, L. Yuan, Z. Xing, and Y. Zhang, "QR decomposition architecture using the iteration look-ahead modified Gram–Schmidt algorithm", IET Circuits Dev. Syst., vol. 10, no. 5, pp. 402-409, 2016.
[http://dx.doi.org/10.1049/iet-cds.2015.0349]
[22]
P.S. Kumar, P. Vatsalkumar, S. Dolui, N. Khan, and A.A. Bazil Raj, "Design of digital architecture for custom implementation of cordic algorithm", In International Conference on System, Computation, Automation and Networking (ICSCAN), 2021, pp. 1-6
[http://dx.doi.org/10.1109/ICSCAN53069.2021.9526417]
[23]
N.K. Sharma, D.K. Gautam, L.K. Sahu, M.R. Khan, and J. Jain, "CORDIC algorithm for fixed poin", Mater. Today Proc., vol. 2021, p. 662, 2021.
[http://dx.doi.org/10.1016/j.matpr.2021.05.662]
[24]
Y. Hu, and P.C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms", Speech Commun., vol. 49, no. 7, pp. 588-601, 2007.
[http://dx.doi.org/10.1016/j.specom.2006.12.006] [PMID: 18046463]
[25]
B. Dumitrescu, and P. Irofti, "Regularized K-SV", IEEE Signal Process. Lett.,, 2017. Available from:https://irofti.net/papers/DumitrescuIrofti17_RegKSVD.pdf
[26]
I. Kviatkovsky, M. Gabel, E. Rivlin, and I. Shimshoni, "On the equivalence of the LC-KSVD and the D-KSVD algorithms", IEEE PAMI, vol. 39, no. 2, pp. 1-6, 2017.
[27]
G. Grossi, R. Lanzarotti, and J. Lin, "Orthogonal procrustes analysis for dictionary learning in sparse linear representation", PLOS ONE, vol. 2017, p. 0169663, 2017.
[http://dx.doi.org/10.1371/journal.pone.0169663]