An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Page: [296 - 306] Pages: 11

  • * (Excluding Mailing and Handling)

Abstract

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters.

Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster.

Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques.

Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem.

Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Keywords: Text clustering, β-Hill climbing, local exploitation search, optimization problem, clusters, k-mean techniques.

Graphical Abstract

[1]
Shehab M, Khader AT, Al-Betar MA, Abualigah LM. Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 8th International Conference on Information Technology (ICIT). 2017 May 17-18; Amman, Jordan: IEEE; pp 36-43.
[2]
Bolaji ALA, Al-Betar MA, Awadallah MA, Khader AT, Abualigah LM. A comprehensive review: Krill Herd algorithm (KH) and its applications. Appl Soft Comput 2016; 49: 437-46.
[http://dx.doi.org/10.1016/j.asoc.2016.08.041]
[3]
Abualigah LM, Khader AT, Al-Betar MA. Multi-objectives-based text clustering technique using K-mean algorithm. In: 7th International Conference on Computer Science and Information Technology (CSIT). 2016 July 13-14; Amman, Jordan. IEEE 2016; pp. 1-6
[4]
Tunali AV, Bilgin T, Camurcu A. An improved clustering algorithm for text mining: Multi-cluster spherical k-means. Int Arab J Inf Technol 2016; 13(1): 12-9.
[5]
Abualigah LM, Khader AT, Al-Betar MA, Alomari OA. Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 2017; 84: 24-36.
[http://dx.doi.org/10.1016/j.eswa.2017.05.002]
[6]
Abualigah LM, Khader AT, AlBetar MA, Hanandeh ES. Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. In: First EAI International Conference on Computer Science and Engineering. 2016 Nov 11-12; Penang, Malaysia. EUDL 2017; pp. 1-10.
[http://dx.doi.org/10.4108/eai.27-2-2017.152282]
[7]
Abualigah L. A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments. Cluster Comput 2020; 2020: 1-19.
[http://dx.doi.org/10.1007/s10586-020-03075-5]
[8]
Abualigah LM, Khader AT, Al-Betar MA, Awadallah MA. A krill herd algorithm for efficient text documents clustering. In: IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). 2016 May 30-31; Batu Feringghi, Malaysia. IEEE 2016; pp. 67-72.
[http://dx.doi.org/10.1109/ISCAIE.2016.7575039]
[9]
Abualigah LM, Khader AT, Hanandeh ES. A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intell Decision Technol 2018; 12(6): 1-12.
[10]
Bharti KK, Singh PK. Chaotic gradient artificial bee colony for text clustering. Soft Comput 2016; 20(3): 1113-26.
[http://dx.doi.org/10.1007/s00500-014-1571-7]
[11]
Mahdavi M, Chehreghani MH, Abolhassani H, Forsati R. Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 2008; 201(1): 441-51.
[http://dx.doi.org/10.1016/j.amc.2007.12.058]
[12]
Yeh WC, Lai CM, Chang KH. A novel hybrid clustering approach based on K-harmonic means using robust design. Neurocomputing 2016; 173: 1720-32.
[http://dx.doi.org/10.1016/j.neucom.2015.09.045]
[13]
Chandran TR, Reddy AV, Janet B. Text clustering quality improvement using a hybrid social spider optimization. Int J Appl Eng Res 2017; 12(6): 995-1008.
[14]
Tunali V, Bilgin T, Camurcu A. An improved clustering algorithm for text mining: multi-cluster spherical k-means. Int Arab J Inf Technol 2016; 13(1): 12-9.
[15]
Kohli S, Mehrotra S. A clustering approach for optimization of search result. J Images Graph 2016; 4(1): 63-6.
[http://dx.doi.org/10.18178/joig.4.1.63-66]
[16]
Abualigah LM, Khader AT, Hanandeh ES. A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 2018; 73: 111-25.
[http://dx.doi.org/10.1016/j.engappai.2018.05.003]
[17]
Abualigah LM, Khader AT, Hanandeh ES. Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 2018; 48: 1-25.
[http://dx.doi.org/10.1007/s10489-018-1190-6]
[18]
Abualigah LM, Khader AT, Hanandeh ES. In: Zelinka I, Vasant P, Duy VH, Dao TT, Eds. Innovative Computing, Optimization and Its Applications: Modelling and Simulations. Cham: Springer 2018; pp. 305-20.
[http://dx.doi.org/10.1007/978-3-319-66984-7_18]
[19]
Zaw MM, Mon EE. In: Ed Yang X-S Recent advances in swarm intelligence and evolutionary computation. Berlin: Springer International Publishing 2015; pp. 263-81.
[20]
Sharma S, Gupta V. Recent developments in text clustering techniques. IJCA 2012; 37(6): 14-9.
[http://dx.doi.org/10.5120/4611-6604]
[21]
Liu W, Wong W. Web service clustering using text mining techniques. Int J Agent-Orient Softw Engineer 2009; 3(1): 6-26.
[http://dx.doi.org/10.1504/IJAOSE.2009.022944]
[22]
Rangrej A, Kulkarni S, Tendulkar AV. Comparative study of clustering techniques for short text documents. In: Proceedings of the 20th International Conference Companion on World Wide Web. 2011 March; New York, NY, USA. ACM; pp. 111-2.
[23]
Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH. A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 2017; 60: 423-35.
[http://dx.doi.org/10.1016/j.asoc.2017.06.059]
[24]
Cui X, Potok TE, Palathingal P. Document clustering using particle swarm optimization. In: Proceedings of 2005 IEEE Swarm Intelligence Symposium. 2005 June 8-10; Pasadena, CA, USA: IEEE pp 185-91.
[25]
Jensi R, Jiji GW. A survey on optimization approaches to text document clustering. arXiv:1401.2229 [cs.IR].
[http://dx.doi.org/10.5121/ijcsa.2013.3604]
[26]
Bharti KK, Singh PK. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst Appl 2015; 42(6): 3105-14.
[http://dx.doi.org/10.1016/j.eswa.2014.11.038]
[27]
Abualigah LMQ, Hanandeh ES. Applying genetic algorithms to information retrieval using vector space model. IJCSEA 2015; 5(1): 19.
[28]
Hanandeh E, Maabreh K. Effective information retrieval method based on matching adaptive genetic algorithm. J Theor Appl Inform Technol 2015; 81(3): 446.
[29]
Abualigah LM, Khader AT, Al-Betar MA, Alyasseri ZA, Alomari OA, Hanandeh ES. Feature selection with β-hill climbing search for text clustering application. In: Palestinian International Conference on Information and Communication Technology (PICICT). 2017 8-9 May; Gaza City, Palestinian Authority. IEEE. 2017; pp.22-7
[30]
Abualigah LM, Khader AT, Al-Betar MA. Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In: 7th International Conference on Computer Science and Information Technology (CSIT). 2016 July 13-14; Amman, Jordan. IEEE 2016; pp. 1-6.
[http://dx.doi.org/10.1109/CSIT.2016.7549453]
[31]
Abualigah LM, Khader AT. Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 2017; 2017: 1-23.
[http://dx.doi.org/10.1007/s11227-017-2046-2]
[32]
Roul RK, Varshneya S, Kalra A, Sahay SK. A novel modified apriori approach for web document clustering. Comput Intell Data Mining 2015; 3: 159-71.
[http://dx.doi.org/10.1007/978-81-322-2202-6_14]
[33]
Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES. A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. In: First EAI International Conference on Computer Science and Engineering. 2016 Nov 12-16; Penang, Malaysia EUDL 2017: pp.1-10. .
[34]
Abualigah LM, Khader AT, Hanandeh ES. A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 2018; 25: 456-66.
[35]
Alomari OA, Khader AT, Mohammed AAB, et al. MRMR BA: A hybrid gene selection algorithm for cancer classification. J Theoretical Appl Inf Techn 2017; 95(12): 2610-8.
[36]
Alomari OA, Khader AT, Al-Betar MA, Abualigah LM. Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 2017; 19(1): 32-51.
[http://dx.doi.org/10.1504/IJDMB.2017.088538]
[37]
Akter R, Chung Y. An evolutionary approach for document clustering. IERI Procedia 2013; 4: 370-5.
[http://dx.doi.org/10.1016/j.ieri.2013.11.053]
[38]
Devi SS, Shanmugam A, Prabha ED. A proficient method for text clustering using harmony search method. IJSRSET 2015; 1(1): 145-50.
[39]
Forsati R, Mahdavi M, Shamsfard M, Meybodi MR. Efficient stochastic algorithms for document clustering. Inf Sci 2013; 220: 269-91.
[http://dx.doi.org/10.1016/j.ins.2012.07.025]
[40]
Amiri E, Mahmoudi S. Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 2016; 41: 15-21.
[http://dx.doi.org/10.1016/j.asoc.2015.12.008]
[41]
Jun S, Park SS, Jang DS. Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst Appl 2014; 41(7): 3204-12.
[http://dx.doi.org/10.1016/j.eswa.2013.11.018]
[42]
Mohammed AJ, Yusof Y, Husni H. Document clustering based on firefly algorithm. J Comput Sci 2015; 11(3): 453.
[http://dx.doi.org/10.3844/jcssp.2015.453.465]
[43]
Jun S, Park SS, Jang DS. Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Syst Appl 2014; 41(7): 3204-12.
[http://dx.doi.org/10.1016/j.eswa.2013.11.018]
[44]
Mirhosseini M. A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering. Turk J Electr Eng Comput Sci 2017; 25(2): 1251-62.
[http://dx.doi.org/10.3906/elk-1508-31]
[45]
Al-Jadir I, Wong KW, Fung CC, Xie H. Text document clustering using memetic feature selection ACM 2017; 2017: 415-20
[http://dx.doi.org/10.1145/3055635.3056603]
[46]
Boyack KW, Small H, Klavans R. Improving the accuracy of co‐citation clustering using full text. J Am Soc Inf Sci Technol 2013; 64(9): 1759-67.
[http://dx.doi.org/10.1002/asi.22896]
[47]
Nayak J, Naik B, Behera HS, Abraham A. Hybrid chemical reaction based metaheuristic with fuzzy c-means algorithm for optimal cluster analysis. Expert Syst Appl 2017; 79: 282-95.
[http://dx.doi.org/10.1016/j.eswa.2017.02.037]