Recent Advances in Computer Science and Communications

Author(s): Shivani Rana, Rakesh Kanji and Shruti Jain*

DOI: 10.2174/2666255816666230507182018

Automated System for Movie Review Classification using BERT

Article ID: e070523216620 Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Aims: Text classification emerged as an important approach to advancing Natural Language Processing (NLP) applications concerning the available text on the web. To analyze the text, many applications are proposed in the literature.

Background: The NLP, with the help of deep learning, has achieved great success in automatically sorting text data in predefined classes, but this process is expensive and time-consuming.

Objectives: To overcome this problem, in this paper, various Machine Learning techniques are studied & implemented to generate an automated system for movie review classification.

Methodology: The proposed methodology uses the Bidirectional Encoder Representations of the Transformer (BERT) model for data preparation and predictions using various machine learning algorithms like XG boost, support vector machine, logistic regression, naïve Bayes, and neural network. The algorithms are analyzed based on various performance metrics like accuracy, precision, recall and F1 score.

Result: The results reveal that the 2-hidden layer neural network outperforms the other models by achieving more than 0.90 F1 score in the first 15 epochs and 0.99 in just 40 epochs on the IMDB dataset, thus reducing the time to a great extent.

Conclusion: 100% accuracy is attained using a neural network, resulting in a 15% accuracy improvement and 14.6% F1 score improvement over logistic regression.

Graphical Abstract

[1]
S. Rana, R. Kanji, and S. Jain, Comparison of SVM and Naïve Bayes for Sentiment Classification using BERT data. 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), 2022, pp. 1-5.
Aligarh, India [http://dx.doi.org/10.1109/IMPACT55510.2022.10029067]
[2]
N. Prashar, M. Sood, and S. Jain, "A novel cardiac arrhythmia processing using machine learning techniques", Int. J. Image Graph., vol. 20, no. 3, p. 2050023, 2020.
[http://dx.doi.org/10.1142/S0219467820500230]
[3]
H. Kirti, "Sohal, S Jain, “Multistage classification of arrhythmia and atrial fibrillation on long-term heart rate variability”, J. Engineer", Sci. Technol., vol. 15, no. 2, pp. 1277-1295, 2020.
[4]
C.C. Aggarwal, and C.X. Zhai, A Survey of text classification algorithms. Mining text data., Springer, 2012, pp. 163-222.
[http://dx.doi.org/10.1007/978-1-4614-3223-4_6]
[5]
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality", NIPS, pp. 3111-3119, 2013.
[6]
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman, "Indexing by latent semantic analysis", J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391-407, 1990.
[http://dx.doi.org/10.1002/(SICI)1097-4571(199009)41:6<391:AID-ASI1>3.0.CO;2-9]
[7]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need", Adv. Neural Inf. Process. Syst., pp. 5998-6008, 2017.
[8]
A. Alsaeedi, and M. Zubair, "A study on sentiment analysis techniques of twitter data", Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 2, 2019.
[http://dx.doi.org/10.14569/IJACSA.2019.0100248]
[9]
A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision", CD224N Project Report, Stanford, p. 1-6, .
[10]
M. Anjaria, and R.M.R. Guddeti, Influence factor based opinion mining of Twitter Data using Supervised Learning. Sixth International Conference on Communication Systems and Networks(COMSNETS), 2014, pp. 1-8.
Bangalore, India [http://dx.doi.org/10.1109/COMSNETS.2014.6734907]
[11]
A. Kennedy, and D. Inkpen, "Sentiment classification of movie reviews using contextual valence shifters", Comput. Intell., vol. 22, no. 2, pp. 110-125, 2006.
[http://dx.doi.org/10.1111/j.1467-8640.2006.00277.x]
[12]
H. Cui, V. Mittal, and M. Datar, Comparative experiments on sentiment classification of online product reviews. Association for the Advancement of Artificial Intelligence, 2006, pp. 6-30.
[13]
A. Krishna, V. Akhilesh, A. Aich, and C. Hegde, Sentiment analysis of restaurant reviews using machine learning techniques. Emerging research in Electronics, Computer Science and Technology., Springer: Germany, 2019, pp. 687-696.
[14]
N.C. Dang, M.N.M. Garcia, and F.D.L. Prieta, "Sentiment analysis based on deep learning: A comparative study", Electronics, vol. 9, no. 3, p. 483, 2020.
[http://dx.doi.org/10.3390/electronics9030483]
[15]
H. Batra, N. Singh Punn, and S.K. Sonbhadra, "BERT- based sentiment analysis: A software engineering perspective", AxXiv, 2021.
[16]
M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, and K. Lee, Deep contextualized word representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.
[17]
T.B. Brown, B. Mann, N. Ryder, M. Subbiah, and J. Kaplan, "Language models are few-shot learners", Computation and Language, 2020.
[18]
J. Davlin, M.W. Chang, K. Lee, and K. Toutanova, "BERT: Pre training of deep bidirectional transformers for language understanding",
[19]
P. Ashokkumar, G. Siva Shankar, G. Srivastava, P.K.R. Maddikunta, and T.R. Gadekallu, "A two stage text feature selection algorithm for improving text classification”, ACM transaction Asian low-resour", Lang. Inf. Process, vol. 20, no. 3, 2021.
[20]
S. Minaee, N. Kalchbrenner, E. Cambria, and N. Nikzad, "Deep learning based text classification: A comprehensive review", Comput. Lang., p. 4, 2021.
[21]
X. Qiu, T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang, "Pre-trained models for natural language processing", Survey, 2020.
[22]
X. Dai, S. Karimi, B. Hachey, and C. Paris, Cost-effective selection of pretraining data: A case study of pretraining BERT on social media. Findings of the Association for Computational Linguistics: EMNLP, 2020, pp. 1675-1681.
[http://dx.doi.org/10.18653/v1/2020.findings-emnlp.151]
[23]
M. Singh, A.K. Jakhar, and S. Pandey, "Sentiment analysis on the impact of coronavirus in social life using the BERT model", Soc. Netw. Anal. Min., vol. 11, no. 1, p. 33, 2021.
[http://dx.doi.org/10.1007/s13278-021-00737-z] [PMID: 33758630]
[24]
R. Chandra, and A. Krishna, "COVID-19 sentiment analysis via deep learning during the rise of novel cases", PLoS One, vol. 16, no. 8, p. e0255615, 2021.
[http://dx.doi.org/10.1371/journal.pone.0255615] [PMID: 34411112]
[25]
S.U. Hassan, J. Ahamed, and K. Ahmad, Analytics of machine learning based algorithms for text classification., Sustainable Operat. Comput, 2022, pp. 238-248.
[http://dx.doi.org/10.1016/j.susoc.2022.03.001]
[26]
X. Luo, "Efficient English text classification using selected machine learning techniques", Alex. Eng. J., vol. 60, no. 3, pp. 3401-3409, 2021.
[http://dx.doi.org/10.1016/j.aej.2021.02.009]
[27]
A. Mitra, "“Sentiment Analysis using machine learning approaches (lexicon based on movie review based)”, J. Ubiquitous comput", Communicat. Technol., vol. 2, pp. 145-152, 2020.
[28]
A.M. Rahat, A. Kahir, and A.K.M. Masum, "Comparison of Naïve Bayes and SVM Algorithm based on sentiment analysis using review dataset", Proceedings of the SMART, vol. 2019, pp. 266-270, 2019.
[http://dx.doi.org/10.1109/SMART46866.2019.9117512]
[29]
N.J. Prottasha, A.A. Sami, M. Kowsher, S.A. Murad, A.K. Bairagi, M. Masud, and M. Baz, "Transfer learning for sentiment analysis using BERT based supervised fine-tuning", Sensors, vol. 22, no. 11, p. 4157, 2022.
[http://dx.doi.org/10.3390/s22114157] [PMID: 35684778]
[30]
B.G. Patra, D. Das, A. Das, and R. Prasath, Shared task on sentiment analysis in indian languages (sail) tweets-an overview. Proceedings of the International Conference on Mining Intelligence and Knowledge Exploration, 2015, pp. 650-655.
Hyderabad, India [http://dx.doi.org/10.1007/978-3-319-26832-3_61]
[31]
P. Baid, "A Gupta, N Chaplot, “Sentiment analysis of movie review using machine learning techniques”", Int. J. Comput. Appl., no. Dec, p. 179, 2017.
[32]
S. Jain, and D.S. Chauhan, "Instance-based learning of marker proteins of carcinoma cells for cell death/survival", Comput. Methods Biomech. Biomed. Eng. Imaging Vis., vol. 8, no. 3, pp. 313-322, 2020.
[http://dx.doi.org/10.1080/21681163.2019.1692236]
[33]
S. Jain, "Computer-aided detection system for the classification of non-small cell lung lesions using SVM", Curr. Computeraided Drug Des., vol. 16, no. 6, pp. 833-840, 2020.
[http://dx.doi.org/10.2174/1573409916666200102122021] [PMID: 31899680]
[34]
S. Jain, and M. Sood, "SVM classification of cell survival/apoptotic death for color texture images of survival receptor proteins", Int. J. Emerging Technol., vol. 10, no. 2, pp. 23-28, 2019.
[35]
BotPenguin, AI Chatbot maker. (Feb 2023). Available: https://botpenguin.com/glossary/bert
[36]
[37]
K. Rai, "The math behind logistic regression", Analyt Vidhya, 2020.
[38]
A. Goyal, and A. Parulekar, "Sentiment analysis for movie reviews", Movie Sentement. Anal, 2015.