Abstract
Background: Considering the increasing volume of text document information on Internet
pages, dealing with such a tremendous amount of knowledge becomes totally complex due to
its large size. Text clustering is a common optimization problem used to manage a large amount of
text information into a subset of comparable and coherent clusters.
Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the
problem of the text document clustering through modeling the β-hill climbing technique for partitioning
the similar documents into the same cluster.
Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced
in order to perform a balance between local and global search. Local search methods are
successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean
techniques.
Results: Experiments were conducted on eight benchmark standard text datasets with different
characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results
proved that the proposed β-hill climbing achieved better results in comparison with the original hill
climbing technique in solving the text clustering problem.
Conclusion: The performance of the text clustering is useful by adding the β operator to the hill
climbing.
Keywords:
Text clustering, β-Hill climbing, local exploitation search, optimization problem, clusters, k-mean techniques.
Graphical Abstract
[1]
Shehab M, Khader AT, Al-Betar MA, Abualigah LM. Hybridizing cuckoo search algorithm with hill climbing for numerical optimization problems. In: 8th International Conference on Information Technology (ICIT). 2017 May 17-18; Amman, Jordan: IEEE; pp 36-43.
[3]
Abualigah LM, Khader AT, Al-Betar MA. Multi-objectives-based text clustering technique using K-mean algorithm. In: 7th International Conference on Computer Science and Information Technology (CSIT). 2016 July 13-14; Amman, Jordan. IEEE 2016; pp. 1-6
[4]
Tunali AV, Bilgin T, Camurcu A. An improved clustering algorithm for text mining: Multi-cluster spherical k-means. Int Arab J Inf Technol 2016; 13(1): 12-9.
[9]
Abualigah LM, Khader AT, Hanandeh ES. A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intell Decision Technol 2018; 12(6): 1-12.
[13]
Chandran TR, Reddy AV, Janet B. Text clustering quality improvement using a hybrid social spider optimization. Int J Appl Eng Res 2017; 12(6): 995-1008.
[14]
Tunali V, Bilgin T, Camurcu A. An improved clustering algorithm for text mining: multi-cluster spherical k-means. Int Arab J Inf Technol 2016; 13(1): 12-9.
[19]
Zaw MM, Mon EE. In: Ed Yang X-S Recent advances in swarm intelligence and evolutionary computation. Berlin: Springer International Publishing 2015; pp. 263-81.
[22]
Rangrej A, Kulkarni S, Tendulkar AV. Comparative study of clustering techniques for short text documents. In: Proceedings of the 20th International Conference Companion on World Wide Web. 2011 March; New York, NY, USA. ACM; pp. 111-2.
[24]
Cui X, Potok TE, Palathingal P. Document clustering using particle swarm optimization. In: Proceedings of 2005 IEEE Swarm Intelligence Symposium. 2005 June 8-10; Pasadena, CA, USA: IEEE pp 185-91.
[27]
Abualigah LMQ, Hanandeh ES. Applying genetic algorithms to information retrieval using vector space model. IJCSEA 2015; 5(1): 19.
[28]
Hanandeh E, Maabreh K. Effective information retrieval method based on matching adaptive genetic algorithm. J Theor Appl Inform Technol 2015; 81(3): 446.
[29]
Abualigah LM, Khader AT, Al-Betar MA, Alyasseri ZA, Alomari OA, Hanandeh ES. Feature selection with β-hill climbing search for text clustering application. In: Palestinian International Conference on Information and Communication Technology (PICICT). 2017 8-9 May; Gaza City, Palestinian Authority. IEEE. 2017; pp.22-7
[33]
Abualigah LM, Khader AT, Al-Betar MA, Hanandeh ES. A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. In: First EAI International Conference on Computer Science and Engineering. 2016 Nov 12-16; Penang, Malaysia EUDL 2017: pp.1-10.
.
[34]
Abualigah LM, Khader AT, Hanandeh ES. A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 2018; 25: 456-66.
[35]
Alomari OA, Khader AT, Mohammed AAB, et al. MRMR BA: A hybrid gene selection algorithm for cancer classification. J Theoretical Appl Inf Techn 2017; 95(12): 2610-8.
[38]
Devi SS, Shanmugam A, Prabha ED. A proficient method for text clustering using harmony search method. IJSRSET 2015; 1(1): 145-50.