The Inverse Edit Term Frequency for Informal Word Conversion Using Soundex for Analysis of Customer’s Reviews

Monika       Arora; Vineet       Kansal

Abstract

Background: E-commerce/ M-commerce has emerged as a new way of doing businesses in the present world which requires an understanding of the customer’s needs with the utmost precision and appropriateness. With the advent of technology, mobile devices have become vital tools in today’s world. In fact, smart phones have changed the way of communication. The user can access any information on a single click. Text messages have become the basic channel of communication for interaction. The use of informal text messages by the customers has created a challenge for the business segments in terms of creating a gap pertaining to the actual requirement of the customers due to the inappropriate representation of it's need by using short message service in an informal manner.

Objective: The informally written text messages have become a center of attraction for researchers to analyze and normalize such textual data. In this paper, the SMS data have been analyzed for information retrieval using Soundex Phonetic algorithm and its variations.

Methods: Two datasets have been considered, SMS- based FAQ of FIRE 2012 and self-generated survey dataset have been tested for evaluating the performance of the proposed Soundex Phonetic algorithm.

Results: It has been observed that by applying Soundex with Inverse Edit Term Frequency, the lexical similarity between the SMS word and Natural language text has been significantly improved. The results have been shown to prove the work.

Conclusion: Soundex with Inverse Edit Term Frequency Distribution algorithm is best suited among the various variations of Soundex. This algorithm normalizes the informally written text and gets the exact match from the bag of words.

Keywords: E-commerce, phonetic algorithms, mobile, smart phones, text messages, SMS, information retrieval, edit distance, term frequency.

Graphical Abstract

[1] 
R.E. Grinter,  and M.A. Eldridge, "Y do tngrs luv 2 txt msg?  ECSCW 2001 Proceedings of the Seventh European Conference on Computer- Supported Cooperative Work, Bonn, Germany, Netherlands, Springer,", 
[2] 
K.W. Lim, C. Chen,  and W. Buntine, Twitter-Network Topic Model: A full Bayesian treatment for social network and modelling  arXiv preprint arXiv:1609.06791, 2016.
[3] 
J. Leveling, DCU@ FIRE 2012: Monolingual and Crosslingual SMS-based FAQ retrieval, .
[4] 
W. Cavnar, Using an n-gram-based document representation with a vector processing retrieval model., NIST SPECIAL PUBLICATION, 1995, p. 269.
[5] 
P. Mcnamee,  and J. Mayfield, Character n-gram tokenization for European language text retrieval  Info. retrieval,, vol.  7, No.1,. 2004, pp. 73-97.
[6] 
M. Morita,  and Y. Shinoda, "Information filtering based on user behavior analysis and best match text retrieval", Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval  Springer-Verlag New York, Inc, pp. 272-281, 1994.
[7] 
T.N. Gadd, "Fisching fore werds: Phonetic retrieval of written text in information systems", Program, vol. 22, no. 3, pp. 222-237, 1988.
[8] 
J. Zobel,  and P. Dart, "Phonetic string matching: Lessons from information retrieval", In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 166-172, 1996.
[9] 
K. Xu, Y. Xia,  and C.H. Lee, "Tweet normalization with syllables", Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing  Beijing, China, , Association for Computational Linguistics, pp. 920-928, 2015.
[10] 
A.D. Shaikh, R.R. Shah,  and R. Shaikh, "SMS based FAQ retrieval for Hindi, English and Malayalam  In Post-Proceedings of the 4th and 5th Workshops of the Forum for Information Retrieval Evaluation,", 
[11] 
R. Russell,  and M. Odell, "Soundex. US Patent 1261167 and US Patent 1435663, 1918 and 1922", 
[12] 
ES. Ristad,  and PN. Yianilos, "Learning string-edit distance", IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 5, pp. 522-532, 1998.
[13] 
R. Kelly, 2016 PyEnchant a spellchecking library for Python. Ηλεκτρονικό]. Available:.https://pythonhosted. org/pyenchant
[14] 
A.J. Lotka, "The frequency distribution of scientific productivity", J. Wash. Acad. Sci., 1926.
[15] 
D.W. Goodall, "A new similarity index based on probability", Biometrics, vol. 1, pp. 882-907, 1966.

Cite As

Recent Advances in Computer Science and Communications

The Inverse Edit Term Frequency for Informal Word Conversion Using Soundex for Analysis of Customer’s Reviews

Abstract

Graphical Abstract